Available Indexes

Ingest Checklist

Key Components of Ingest

Digital Assets Submission Inventory

All depositors must fill out a Digital Assets Submission Inventory (DASI) prior to ingest. The DASI designates the specific body of content to be ingested and is the official record of submission. It should be signed by the University Librarian of the depositing institution or someone acting in a similar role (for consortia and non-library depositors).  A Creative Commons declaration form may be used to provide access to submitted works that are not in the public domain if appropriate rights have been obtained.

Submission of Bibliographic Metadata

Bibliographic metadata for each digital volume must be present in HathiTrust in order for the volumes to be ingested. Depositors make accurate bibliographic data available to the Zephir team at the California Digital Library at the University of California to be loaded into HathiTrust's bibliographic management system. (See the Ingest Checklist below for details). The bibliographic records act as a manifest of the digital content and are used as part of a broader digital object management strategy, including rights management. HathiTrust uses information in the submitted bibliographic records to make an initial rights determination about each volume. Details about the information in each bibliographic information that is used to make rights determinations is available at https://www.hathitrust.org/bib_rights_determination.

Bibliographic Metadata Specifications

Bibliographic metadata associated with digital volumes should conform to our specifications.

Submission of content

Submission of content can happen in a variety of ways, specified in the Digital Assets Submission Inventory.

Ingest Checklist

Administrative Coversheet

We request a variety of administrative information related to ingest of both bibliographic metadata and content. A fill-able coversheet that can be copied and submitted is available as a Google doc.

Bibliographic Metadata Ingest

  • Contributor sends bibliographic records to the Zephir team at the University of California (UC).
    • Bibliographic data from each distinct source (e.g., Google, Internet Archive, local) should be sent separately, one file per source. See ​Bibliographic Metadata Submission.
  • Zephir receives bibliographic data
  • Zephir does duplicate detection based on OCLC number
    • This duplicate detection does not weed items; it associates items HathiTrust receives with existing bib records (if duplicates are detected) or creates new records (if bib records with matching OCLC numbers do not exist).
  • Zephir loads bibliographic data into metadata management system.

Content Ingest

Google Content:

  • Contributor selects a content package type and communicates the choice to Google
  • Most Contributors are using a hybrid format for scanned volumes, containing bitonal TIFF files for all-text pages and JPEG2000 files for images. This is the most cost-effective package, as average volume sizes are 25% smaller for the hybrid packages than the all-JPEG2000.
  • Contributor requests that Google open Contributor's GRIN instance to HathiTrust
  • HathiTrust requests decryption keys from Google and begins download from GRIN

Internet Archive-digitized Content:

  • HathiTrust uses identifier information included in the bibliographic metadata to download volumes from the Internet Archive and ingest them into HathiTrust.

Non-Google Content:

  • Contributor performs any pre-ingest transformations that are needed (HathiTrust provides tools to assist in transformation, validation, and packaging of materials for ingest).
  • Contributor delivers content to HathiTrust through agreed-upon mechanisms (hard drive, file download, etc.)
  • Pre-ingest quality assurance may be performed by contributors

All Content:

  • HathiTrust ingests a sample of objects, including validation of the objects, and creation of HathiTrust METS and PREMIS
  • HathiTrust performs internal testing to be sure volumes are working properly in the system
  • Ingest reports are made available to the Contributor at https://www.hathitrust.org/ingest_logs
  • Contributor may perform additional QA testing on ingested objects
  • Full ingest of Contributor's digital content begins