Library-Inspired Artificial Intelligence: Discovery, Part 1

October 22, 2018
Catherine Nicole Coleman
Image cluster from Teenie Harris Photograph Archive showing people in fur coats.

In December of this year, Stanford Libraries will co-host a conference on artificial intelligence with the National Library of Norway in Oslo. In preparation for that event, Svein Arne Brygfjeld, who leads their library’s digitization and artificial intelligence efforts, visited Stanford earlier this month to meet with SUL AI Studio project teams.  Though the experiments at the National Library of Norway and those we are beginning at Stanford Libraries are mostly oriented to improving and augmenting existing practices, our conversation eventually turned the influence that AI-augmented technologies already has in our daily lives and how that is changing our assumptions about information access and discovery. 

Discovery in the library remains closely tied to the card catalog. When bibliographic records first went digital, libraries were still thinking in terms of finite information in a closed system. Meanwhile, patrons shifted their search to the World Wide Web, specifically Google, as a starting point to discover materials in library catalogs. Individual libraries could not compete with the comprehensive search of indexed catalog records that Google offered. Over the past few years, Philip Schreur (Associate University Librarian for Technical and Access Services, Stanford Libraries), has been leading Stanford’s move to a networked data model for libraries based on RDF — more fluid, generative, and deeply connected — where every element has a unique identifier and descriptive metadata is semantically structured and machine readable. 

Libraries have recognized the need to be more web-like in content delivery as well. In 2011, the Bodleian Library, the British Library and the Stanford University Library, conceived a model for digital image delivery system that works in concert with the linked open data model of a web of content that has become the International Image Interoperability Framework (IIIF). IIIF is now a mature community of international partners and the image and presentation APIs make it possible to build virtual image collections at will, bringing together content housed in very disparate geographic locations as if it is on your local machine. Cultural heritage image archives no longer need to be “walled gardens of technology.” Institutions now have a shared method for easy exchange of data. But discovering those images remains a tremendous challenge.

Now is the time for a revolution in discovery. 

Dimensionality reduction and classification—the essence of machine learning— are powerful techniques we can employ to keep up with the need to generate metadata for discovery. Already, our friends at the National Library of Norway are automating Dewey classification. Similar work has been done to predict Library of Congress subject headings. We can also use machine learning models and algorithms for natural language processing, to parse and structure the descriptive metadata we already have. Which is to say, the tremendous effort libraries have put into curating collections and carefully grooming catalog records over the years need not be lost. The Carnegie Museum of Art recently completed a series of experiments on the Teenie Harris photography collection, for example, where they automated the shortening of (very long) image titles, the cleaning up of existing subject headings, as well as extracting names and locations from descriptions so that they could feed this appropriately into their archive data structure.

But the efficient and complex pattern matching enabled by deep learning opens up new approaches to research and discovery that go beyond metadata. There have been several medical image-based research papers published in the last year where a convolutional neural network  used to find kittens and skateboards in pictures is used to identify a diseased retinal scan, diagnose pneumonia from an x-ray, and distinguish malignant from benign tumors using a picture taken from a smartphone. These techniques also reveal patterns that were unanticipated or that researchers did not even think were quantifiable. Looking for signs of a diseased retina, for example, also revealed patterns that indicate the person’s age, gender, BMI, and whether the person is a smoker. Clustering work done by the Frank-Ratchye STUDIO for Creative Inquiry on the Teenie Harris photograph archive mentioned above uncovered sets of photographs of women in fur coats and of car crashes. 

As researchers begin to identify useful visual patterns in image collections, they are likely to want to search a collection to find matching visual patterns. Recent work on similarity search holds out promise that researchers who have amassed a collection of images and want to find similar images can search not against the structured data of a catalog, but based on the pixel representation of the image to find nearest-neighbor images in high-dimensional vector space. And this type of search can be used to further enhance metadata as well, completing the virtuous cycle of discovery. As our metadata librarian, Hilary Thorsen, pointed out in a SUL AI Studio meeting recently, similarity search on images would aid the practice of copy cataloguing, matching against an established bibliographic record. 

If an historian of medieval Europe, for example, wanted to find manuscripts written in half uncial, keyword search on our catalog would not help, but if he (I have one in mind) could drop a sample image in a search box to retrieve results with matching script, we could provide the researcher with the results he is seeking and learn from that query to improve our catalog at the same time.


Papers linked above:

Schreur, Philip Evan. "The academy unbound." Library Resources & Technical Services 56.4 (2012): 227-237.

Snydman, Stuart, Robert Sanderson, and Tom Cramer. "The International Image Interoperability Framework (IIIF): A community & technology approach for web-based images." Archiving Conference. Vol. 2015. No. 1. Society for Imaging Science and Technology, 2015.

Poplin, Ryan, et al. "Predicting cardiovascular risk factors from retinal fundus photographs using deep learning." arXiv preprint arXiv:1708.09843 (2017).

Rajpurkar, Pranav, et al. "Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning." arXiv preprint arXiv:1711.05225 (2017).

Esteva, Andre, et al. "Dermatologist-level classification of skin cancer with deep neural networks." Nature 542.7639 (2017): 115.

Iscen, Ahmet, et al. "Memory vectors for similarity search in high-dimensional spaces." IEEE Transactions on Big Data 4.1 (2018): 65-77.