Navigation

HTRC Architecture and Technical Organization

The HTRC will provide a persistent and sustainable structure to enable original and cutting edge research. It will stimulate the development of new functionality and tools to enable new discoveries that would not be possible without the HTRC. 

Key architectural and organizational aspects

  1. A unique aim of the Center is support for "non-consumptive" research on in-copyright works, allowing scholars to perform computational analyses on these works within the bounds of U.S. copyright law. We anticipate accomplishing this through a novel grid and cloud based architecture, security, auditing and provenance collection. Non-consumptive research has been defined as “research in which computational analysis is performed on one or more books, but not research in which a researcher reads or displays."
  2. HTRC will version the HathiTrust collection to enable researchers to tie research back to the version that was active when the research was carried out. Versions will have unique identifiers (such as DOIs) that are long-lasting and immutable.   
  3. HTRC will support interoperability through use of inCommon SAML identity for access by members of the academic federation (see HTRC Access and Use for more information). 
  4. HTRC will give preference to HathiTrust members (see HTRC Access and Use).
  5. HTRC is built on the sound and well-tested principles of a service-oriented architecture to enable interoperability. As part of this, it will maintain a registry repository of text mining algorithms, indexes, and retrieval tools available on-line for human and programmatic discovery. It may also register derived data sets, indexes, and versions in the registry repository. It is partnering with WS02, an open source company in several components of the architecture.
  6. HTRC is intended as a user-driven resource, through an active advisory board, and community sharing model that allows users to share their algorithms and tools.    
  7. HTRC will provide access to familiar tools such as MONK and SEASR text mining and retrieval tools applied to the HTRC public domain works and, later with safeguards in place, to the copyrighted collection.

It is important to delineate the structure of the HathiTrust Research Center with respect to HathiTrust itself. The HathiTrust Repository offers long-term preservation and access services, including bibliographic and full-text search and reading capabilities for public domain volumes and some copyrighted volumes. The HathiTrust Research Center on the other hand, provisions for computational research access to the HathiTrust collection. Limited reading of materials will be possible in the Research Center to accommodate needs for reviewing results, etc., but the destination for reading-based research remains the HathiTrust repository.