Getting Content Into HathiTrust

HathiTrust currently supports ingest of digital book and journal content, and similar book-like materials (e.g., manuscripts). The guidelines and specifications below pertain to these materials specifically. HathiTrust partners are also engaged in pilot projects to provide support for digital audio and image content, as well as born-digital publications. Further information, including policies and specifications surrounding these additional content types, will be forthcoming.

Ingest

Ingest of digital (book-like) objects and associated metadata is performed at the University of Michigan (UM). The digital objects are then replicated to HathiTrust's active mirror site in Indiana, and stored on backup tape. Information about HathiTrust's technical infrastructure can be found at http://www.hathitrust.org/technology.

New Digitization

Institutions that are preparing to engage in digitization projects and wish to comply with HathiTrust specifications should consult the University of Michigan digitization specifications.

Existing Digitized Content

HathiTrust has established mechanisms to accommodate the ingest of content from Google and the Internet Archive efficiently and at scale, such that there are no costs associated with ingest from these sources.

HathiTrust also supports ingest of book and journal content digitized locally by institutions or as part of vended projects. For this content, we encourage institutions to undertake the content or metadata transformations that may be needed to meet HathiTrust specifications prior to submitting content. In cases where institutions are not able, or would prefer not to perform these transformations, the University of Michigan will undertake to do what is necessary to prepare content for ingest. Depending on the nature of the transformations, there may be an associated cost (for instance, if OCR text needs to be generated); some content may not be remediable to HathiTrust specifications.

HathiTrust has developed a framework to faciliate the ingest of book and journal content from a variety of sources. The framework includes:

  • Guidelines for Deposit, including HathiTrust policies on preservation, content formats, metadata and descriptive information, validation, and quality for digitized books and journals.
  • A Deposit Form containing detailed specifications and requirements for submitted content and metadata. Institutions that would like UM to prepare content for ingest must fill out the form. Institutions that perform their own transformations must comply with the specifications listed, and are encouraged to filll out the form to aid in content preparation and submission.
  • A Checklist of the steps and responsibilities that are involved in the ingest process.

How to Proceed

Institutions depositing content from Google or the Internet Archive are encouraged to read the Guidelines for Deposit, but should proceed directly to the Ingest Checklist to learn next steps.

Institutions depositing locally-digitized or vendor content other than Google or the Internet Archive should begin by examining the Guidelines for Deposit and the Deposit Form