HathiTrust Research Center Awards Six ACS Projects

HathiTrust Research Center (HTRC) is pleased to announce the award of its third round of Advanced Collaborative Support (ACS) projects. This round’s request for proposals focused on projects that engage with in-copyright data, especially via HTRC’s Data Capsule service. Submitted proposals addressed a diverse set of topics and research areas and were submitted from a varied pool of institutions from around the world. Out of a large body of quality submissions, only six projects were awarded.

Awardees will be provided dedicated HTRC staff time to support their research using texts in the HathiTrust Digital Library for a period of up to six months. New this round, each ACS project team will release its workset (research collections of data, or metadata, for analysis) publicly, allowing for other researchers to engage with the same workset. Proposals were reviewed on their feasibility, research methodology, and compatibility with HTRC staff resources, as well as the availability of requested data and potential to positively benefit a wide community of scholars.

The six awarded projects are:

Computational Support for Reading Chicago Reading
Robin Burke, John Shanahan, Ana Lucic (DePaul University)
The Reading Chicago Reading team will seek to extend their own research on the “One Book, One Chicago” city-wide reading program by incorporating textual analysis on books chosen for the OBOC program, as well as comparison texts. Further, the resulting textual analysis—including toponym extraction, sentiment analysis, and story arc detection—will be paired with library patron, circulation and demographic data to present a fuller picture about the OBOC program, and the books chosen for inclusion.

Modeling the History of Book Design
David Bamman, Bjorn Hartmann (University of California, Berkeley)
This project will utilize the HTRC Data Capsule to conduct feature extraction on page images from 10,000 in-copyright books in the HathiTrust repository, extracting features such as page construction, line justification, leading between baselines, kerning between letter pairs/combinations, line density per page, characters per line, position of images, typeface (serif, sans-serif) and font size. Beyond the analysis and utility of the extracted feature set, this project also seeks to serve as a use case for engagement with HathiTrust/HTRC beyond books-as-strings-of-words analysis.

The Power of Place: Structure, Culture, and Continuities in U.S. Women’s Movements
Laura Nelson (Northeastern University)
Dr. Nelson’s project will study the women's movement in the United States from 1848-1975 in two cities, New York City and Chicago, using new advances in network analysis and computational text analysis to identify structural and cultural diversity. This approach is three-pronged: building a workset of writing by individuals and organizations within the movements in New York and Chicago, using network analysis to measure the structure of this movement, and conducting computational text analysis to measure the underlying culture and ideas within the movement, including lexical analyses to identify distinctive words and topic modeling to identify dominant themes.

A Computational History of the U.S. Novel, 1950-2000
Richard Jean So (McGill University)
Dr. So’s project seeks to write a new history of the American novel by examining a series of large textual datasets focused on the full cycle of the U.S. literary field from production to reception to canonization. The major goal is to identify the emergence of new patterns of language, style, discourse and themes in American novels as they appear at different moments in the cycle of literary production and reception, including publication via large publishing houses such as Random House, and book reviews in major U.S. periodicals. This will be achieved through using the HTRC Data Capsule environment to undertake text analysis of full texts, including using various methods in Machine Learning and Natural Language Processing, such as topic models, word embeddings, and specialized tools such as BookNLP, which allows for the extraction of grammatical dependencies and characters.

Measuring Literary Novelty
Laura McGrath, Devin Higgins, Arend Hintze (Michigan State University)
This work draws on ongoing collaborative efforts to develop a method for applying genetic sequencing tools to the study of literature in order to identify and measure literary novelty, and address questions of literary history, canonicity, and prestige. Previous results have been suggestive of a prominent connection between the purely information-based novelty of the sequences of characters that comprise literary texts, and the experimental newness we associate with modernist literary texts. Leveraging the HTRC Data Capsule will offer the potential to apply this theory at scale for the first time, and potentially lead into new research into modernism and the literary history of the 20th century.

A Writer’s Workshop Workset with the Program Era Project (PEP)
Nicholas Kelly, Loren Glass, Nikki White (University of Iowa)
The PEP team will compile a proof-of-concept workset with, at first, prominent individuals (faculty, staff, students) who were involved with the Iowa Writers’ Workshop (IWW), then produce “style cards” for each author’s works (by volume), based on stylometric data gathered through text analysis of the IWW workset within the HTRC Data Capsule. It is the goal of the project to also create a living workset that can be continually updated for scholars who wish to engage with IWW authors and their writing.

HTRC releases ACS program requests for proposal annually, and is funded in part by HathiTrust, Indiana University, and University of Illinois. For more information about ACS, contact For general inquiries, contact