SUL AI Studio

SUL AI Studio

The Stanford Libraries hold rich and diverse digital collections of text, manuscript documents, photographs, maps, and much more. The staff at the Libraries are an equally rich resource for members of the campus community who want to explore these collections and understand their research value within a particular domain.

In July 2018 the Stanford Libraries started the AI Studio and began to surface projects where applications of artificial intelligence can assist staff with internal information processing and help make collections more discoverable and analyzable for researchers. The effort thus far is not officially staffed, it is experimental driven by the interest and enthusiasm of our library staff.

The list below includes projects that are underway.

See also recorded presentations of work in progress.

Projects Developed in 2018

Transcribing the Allen Ginsberg Tapes

Stanford Libraries holds the Allen Ginsberg papers 1937-1934 which contains correspondence, manuscripts by Ginsberg and other poets and authors, business records, notebooks and journals, clipping files, books, periodicals, audiotapes, videotapes, photographs, and posters. Primarily a speech-to-text transcription project, the team is also interested in identifying and logging repeated signals in the recording (for example the tape recorder clicking on and off). There is also need for handwriting transcription. Ginsberg's own notes about each tape include rich discovery information that the finding aid does not include.

Entity Extraction Single Volume Novels 19th early 20th C

The goal of this project is to make two large collections of under-collected, scarcely held single volume novels (approximately 1600) more discoverable. The first collection, gathered by Wingfield's predecessor, Annette Keough, is the Jarndyce Collection (1,101 titles). This new material is being served to patrons as a mass of undifferentiated texts. Since these are non-canonical authors, they do not have detailed cataloguing information. They are not in Google Books and are not otherwise readily available.

Oceanographic Field Notebooks

This project is one of many in the Library where we need to liberate analog data, in this case from paper, into a digital medium that reserachers can use. The material for this project is a series of reports produced over a period of 23 years, often handwritten, sometimes typed that are part of the CalCOFI Hydrobiological Survey of Monterey Bay. There are very few oceanographic time-series studies from the 1950s - 1970s, and these particular data only exist at our location. These data are an important contribution to studies in the marine sciences, climate change and coastal ecology. Our library is located in a tsunami zone, and since we have the only copy of these data, they are at significant risk of being lost.

Enabling Rich Analysis of Trial Records

The Handa Center for Human Rights and International Justice and the Stanford Libraries are working together on three digital archives projects to make the records of international criminal proceedings publicly accessible with funding from the Flora and William Hewlett Foundation. The goal is to develop a platforms for research on war crimes trials, the challenge is in converting a trove of heterogeneous print and manuscript materials to digital form so that they can be effectively discovered, read and analyzed.

Çatalhöyük Image Repository

This project focuses on the image repository of the Çatalhöyük Archaeological Project ( The images are being submitted for long term preservation in the Stanford Digital Repository (SDR). The additional metadata generated through this project will contribute to discovery and provide a model for future images processed in a similar way.

Ottoman Manuscripts

Associate Professor of History, Ali Yaycioglu has gathered probate inventories of wealthy and powerful individuals who died during the mid-century revolutions from the Balkans to Egypt. The documents record properties, debts and credits linking people to each other. Individuals had debt relationships with entire villages or communities which resulted in complicated negotiations. Debt could be pardoned or sold to another private party. In this period, people were like firms, so the death of an individual can trigger a whole chain of events that can be traced through these documents.

University Archives: Project South

This project will focus on using speech-to-text recognition to generate transcripts for 220 digitized audio recordings associated with the KZSU Project South Interviews. During the summer of 1965, eight students from Stanford University spent ten weeks in the southern states tape-recording information on the civil rights movement. Sponsored by KZSU, Stanford's student radio station, the interviewers visited over fifty civil rights projects in six states and secured three hundred and thirty hours of audio recordings, including over two hundred hours of personal interviews.

Clustering and Classification of SUL Images

This project will explore the cost/benefit of "out-of-the-box" services vs. more customized solutions. If we find that we need customized models to produce useful labels for our image collections, can we generalize those models enough to make them applicable across not only our own holdings but those of our partner institutions as well. And can we develop models tailored for particular types of image collections that offer valuable specialized results?