Getting Content Into HathiTrust

There are two steps for partners to get their content into HathiTrust: the ingest of Bibliographic Data, and the ingest of the Digital Content itself.

Bibliographic Data

Prior to ingest of content, a Participating Institution makes available accurate bibliographic records for digital objects. Those records are loaded into a database at Michigan. Specifications for records for this process follow guidelines to be determined, and require an OCLC number. The records in Michigan’s database are used as part of the Repository digital object management strategy. Each record and digital page image identifies the source of the print volume upon which the digital file is based.

Content

Ingesting content into HathiTrust can happen through a variety of mechanisms, including ingest from Google, removable drives, or Internet delivery (see below). All content must be documented with a signed Digital Assets Agreement and Submission Inventory. A Permissions Agreement can be used to provide access to works that are not in the public domain.

  • Google submission mechanism. HathiTrust initially supports ingest of content from Google. Ingest from Google takes place through automated processes managed by the University of Michigan. These processes ensure reliable data transfer of all content digitized by Google for a Participating Institution; however, access to the Participating Library’s content on Google’s return server is only made possible by Google with the permission of the Participating Library. In cooperation with Google, HathiTrust has also established procedures for the use of the Google return mechanisms for ingest of content not digitized by Google. These mechanisms involve the transfer of content to Google.
  • Non-Google submission mechanism. HathiTrust will work over the first year of operation to develop mechanisms for direct ingest of content from at least one non-Google source (consisting of page image files and related metadata for book and journal content) specified by the Operational Advisory Board. Currently, the Repository accepts only page image files and associated OCR files and metadata.