Key Components of Ingest
Digital Assets Submission Inventory
All depositors must fill out a Digital Assets Submission Inventory (DASI) prior to ingest. The DASI designates the specific body of content to be ingested and is the official record of submission. A Permissions Agreement may be used to provide access to submitted works that are not in the public domain if appropriate rights have been obtained.
Submission of Bibliographic Metadata
Bibliographic metadata for each digital volume must be present in HathiTrust in order for the volumes to be ingested. Depositors make accurate bibliographic data available to the University of California to be loaded into HathiTrust's bibliographic management system. See the Ingest Checklist below for details). The bibliographic records act as a manifest of the digital content and are used as part of a broader digital object management strategy, including rights management. HathiTrust uses information in the submitted bibliographic records to make an initial rights determination about each volume. Details about the information in each bibliographic information that is used to make rights determinations is available at http://www.hathitrust.org/bib_rights_determination.
Bibliographic Metadata Specifications
Bibliographic metadata associated with digital volumes should conform to our specifications.
Submission of content
Submission of content can happen in a variety of ways, specified in the Digital Assets Submission Inventory.
We request a variety of administrative information related to ingest of both bibliographic metadata and content. A fill-able coversheet that can be copied and submitted is available as a Google doc.
Bibliographic Metadata Ingest
- Institution sends bibliographic records to the University of California (UC).
- Bibliographic data from each distinct source (e.g., Google, Internet Archive, local) should be sent separately, one file per source. See Submitting bibliographic records to UC.
- UC receives bibliographic data
- UC does duplicate detection based on OCLC number
- This duplicate detection does not weed items, it associates items UM receives with existing bib records (if duplicates are detected) or creates new records (if bib records with matching OCLC numbers do not exist).
- UC loads bibliographic data into metadata management system.
- Institution selects a content package type and communicates the choice to Google
- Most HathiTrust institutions are using a hybrid format for scanned volumes, containing bitonal TIFF files for all-text pages and JPEG2000 files for images. This is the most cost-effective package, as average volume sizes are 25% smaller for the hybrid packages than the all-JPEG2000.
- Institution requests that Google open Institution's GRIN instance to UM
- The University of Michigan requests decryption keys from Google and begins download from GRIN
Internet Archive-digitized Content:
- UM uses identifier information included in the bibliographic metadata to download volumes from the Internet Archive and ingest them into HathiTrust.
- Institution performs any pre-ingest transformations that are needed (HathiTrust provides tools to assist in transformation, validation, and packaging of materials for ingest).
- Institution delivers content to UM through agreed-upon mechanisms (hard drive, file download, etc.)
- Pre-ingest quality assurance may be performed by partner institutions
- UM ingests a sample of objects, including validation of the objects, and creation of HathiTrust METS and PREMIS
- UM performs internal testing to be sure volumes are working properly in the system
- Ingest reports are made available to the institution at http://www.hathitrust.org/ingest_logs
- Institution may perform additional QA testing on ingested objects
- Full ingest of partner institution's digital content begins