Note: You can sign up for a Google Group mailing list to receive updates on HathiTrust ingest tools.
HathiTrust has a mission to ensure the long-term preservation and accessibility of materials in the digital archive. Ensuring consistency among materials submitted from different sources is one way we do this. To ensure consistency, we have defined baseline requirements for content in a number of areas, including:
- Item identifiers (i.e. how each individual submitted item is identified and named)
- Package layout (file names, directory structure, etc.)
- Image technical characteristics (file format, resolution, color depth, etc.)
- Image metadata (scanning time, scanning artist, etc.)
- Optionally, page number and page tag metadata
Ingest Tool Options
We have made 3 tools available to aid depositors in preparing content to these specifications:
- A single-image validator (http://bit.ly/R9S52t) - validates single uploaded images and provides a report on compliance with HathiTrust specifications.
- A full-volume validator and packaging service (http://bit.ly/1jboMIC) - validates full volumes and remediates problems if provided with sufficient instructions and metadata.
- A HathiTrust Submission Information Package (SIP) Validator (https://github.com/hathitrust/ht_sip_validator) - a command-line tool that allows content preparers to validate locally-digitized content prior to submission to HathiTrust. (Note: We have only released Phase 1 of the Validator. A package that successfully passes validation through the SIP Validator is not currently guaranteed to be successfully ingested into HathiTrust.)
The single-image and full-volume tools are documented at the links above. Contact firstname.lastname@example.org with any questions about using the tools or to submit sample packages or images.
Steps to ingest
Depositors should review the Guidelines for Deposit to gain an understanding of the ingest requirements and how much transformation of content may be needed prior to submission to HathiTrust.
We can help guide depositors through these stages (email email@example.com), and diagnose problems along the way.