There will be two ways that a user of HTRC will access HTRC – through a web portal or programmatically through a service API.
The target audience of the HTRC is nonprofit and educational researchers. However, some tools will be available to the general public, as access to some of the content in the portal will not require a login. In other words, a user coming in through the web portal will be able to undertake simple operations on the public domain corpus without having a login. Logins will be required for more extensive operations, and for all programmatic web services access (more information about access to the HTRC follows).
The HTRC will use the inCommon security infrastructure, a common framework for trustworthy shared management of access to on-line resources in support of education and research in the United States that uses the identity management infrastructure of universities. That means that HTRC users will be able to sign in using their university credentials. Nearly all universities in HathiTrust support inCommon. inCommon has a bridge for individuals to secure inCommon identities so access is not restricted to universities.
Executing an algorithm on 10.6M volumes of text and indexes requires thousands of computer work hours to complete. While HTRC expects to provide some cycles for free using the strong cyberinfrastructure bases of IU and UIUC and through sponsorship, those resources will be quickly depleted under a large number of users. So how will computer resources be allocated? With the help of our advisory board, advice from the HathiTrust Board of Governors, and the input of faculty and researchers, the HTRC will develop a fair and equitable set of policies with a predictable set of prices. These policies will be published as they are developed, vetted, and implemented during Phase 1 of the project.
The HTRC initiative began July 1, 2011. The first phase is an 18-month development cycle. During this first phase, the HTRC will set up the core cyberinfrastructure and data analysis tools, will create end-user services and a portal, will develop support center capabilities, and will provide minimal support for derived research data capabilities. In Phase I, only the public domain works in HathiTrust will be utilized, although if feasible, the HTRC may provide access to the HathiTrust full-text index for appropriate uses in this phase.