HathiTrust Research Center Awards Three ACS Projects for 2020

HathiTrust Research Center (HTRC) has selected three projects for the fifth round of Advanced Collaborative Support (ACS) projects. The applications received were of especially high quality for this abbreviated (6 month) round of support. The selected projects demonstrate a range of research approaches and disciplines.

Awardees will be provided dedicated HTRC staff time to support their research using texts in the HathiTrust Digital Library, through no later than January 2021 . Proposals were reviewed on their feasibility, research methodology, and compatibility with HTRC staff resources, as well as the availability of requested data and potential to positively benefit a wide community of scholars.

The new projects are:

Surveying Applicability of Energy Recovery Technology for Waste Treatment

Aduramo Lasode (University of Minnesota)

This project focuses on data collection and analysis regarding six prime movers: gas turbines, steam turbines, microturbines, reciprocating internal combustion engines, solid oxide fuel cells and Stirling engines. Prime movers are part of an energy recovery effort that reuses heat and power generated from combustible gases produced in waste treatment. In order to aid renewable energy, prime mover technology needs to be optimally implemented within the waste treatment industry. The project will address the difficulty of making optimal choices for waste treatment applications and aid a growing distributed waste treatment industry. First, data collection from the HathiTrust Digital Library will extract values that influence use of these prime movers, mainly power output, efficiency, capital cost, and fuel composition. Then, data analysis will be performed using methodology from a preliminary study evaluating efficiency-based applicability for five of the six aforementioned prime movers. The major output of this project is a dataset containing power, efficiency, cost, and fuel-related information for six prime movers, with a goal to publish data and resulting analysis in addition to being incorporated into the researcher's dissertation. The project outcomes will impact relevant fields in sustainability, combustion, and energy policy through a practical decision guide for choosing prime movers in waste treatment facilities, as well as by highlighting the need for innovation, with a special focus on prime mover applicability in the growing distributed waste treatment industry.

Detecting and Transcribing Arabographic Texts

David Smith (Northeastern University), Matthew Thomas Miller (University of Maryland), Maxim Romanov (University of Vienna), and Sarah Bowen Savant (Aga Khan University, London)

While not predominant, significant material is available in Arabographic languages, such as Arabic and Persian, in HathiTrust. Transcription accuracy in these languages is lower, however, than in Latin-script languages. Perhaps due to the use of synthetic data from digital fonts to train optical character recognition (OCR) systems, OCR models perform poorly on the large numbers of Arabographic printed books set in historical fonts or lithographed from manuscripts. When Arabic or Persian text is embedded in books in other languages, such as English, the non-Latin text is often transcribed as if it were English in a very strange font, resulting in near-zero accuracy. This project will work toward solutions to both problems. First, the research team will improve baseline Arabic and Persian OCR by finding editions of canonical texts in the HathiTrust collection and aligning images of those editions with existing digital transcriptions. Then they will use this aligned data to train new generalized models for a wide variety of typefaces and lithographed book hands. Second, the researchers will build classification models to detect pages with embedded Arabic and Persian, using textual features of the matrix language alone using the HTRC Extracted Features dataset. They will align existing digital transcripts of texts for 30 works in Arabic and 24 works in Persian in HathiTrust to run their baseline OCR system, and then extract page images of lines matched to spans of text in the digital editions.

Tracing the shifting rhetoric of ethnoracial difference in federal responses to education, 1958-2018

Andrés Castro Samayoa (Boston College)

This project leverages HathiTrust’s U.S. Federal Documents Collection to investigate how materials produced by the U.S. federal government document shifts in terminologies of ethnoracial difference. The project will focus on the documents and materials published by the Department of Education (formerly United States Department of Health, Education, and Welfare) and related congressional documents from hearings in specialized subcommittees from 1958 until the present. It will explore how the rhetorics of ethnoracial difference overlapped with the growing allocation of federal resources to postsecondary institutions, particularly Minority Serving Institutions, in the latter half of the 20th century. The start of the National Defense Education Act in 1958 was a watershed moment that signaled the greater engagement of the federal government in higher education.The subsequent passing of the Higher Education Act in 1965, alongside amendments through the 1990s and 2000s, allocated specific federal appropriations to support colleges and universities, including Historically Black Colleges & Universities, Tribal Colleges & Universities, Hispanic Serving Institutions, and Asian American & Native American Pacific Islander Serving Institutions. The project contributes to current work focusing on the history of federal responses to higher education in the United States, and the growing visibility of Minority Serving Institutions as a valuable sector of the postsecondary sector in the United States’ higher education.