HTRC serves a community interested in research and educational computational investigation of the HathiTrust corpus. HTRC serves a community interested in research and educational computational investigation of the HathiTrust corpus.
HTRC hides the complexity of computational investigation of the massive HathiTrust corpus. A researcher engages with HTRC through a complexity-hiding interface as shown in the figure. The interface provides a web portal and a software programmatic interface. HTRC brings together various text mining tools, the HahtiTrust corpus, aggregated and statistical feature information about the corpus, and other data sources that are needed for text mining. Text mining tools are then run on compute resources that are co-located with the HathiTrust data.
The work of researchers and educators, and the HathiTrust corpus are all protected by a single security infrastructure. Users are given secure areas to carry out their activities, and the digital repository is protected from unauthorized use. User identities are managed by an identity server from WSO2.
The HathiTrust corpus is large. Where one classroom project may use 1300 volumes, another may need 4.3 million. HTRC expects to provide cycles for free using the strong cyberinfrastructure bases of IU and UIUC and through sponsorship. Models of access for large-scale tasks will need another policy. With the help of our advisory board, advice from the HathiTrust Board of Governors, and the input of researchers, the HTRC will develop a set of policies with a predictable set of prices. These policies will be published as they are developed, vetted, and implemented during Phase II of the project.