Navigation

Help - Ingest

Is there a limit to how much content I can submit for ingest?

The amount of content you will be able to submit is contingent on a number of factors, such as what you provided for content estimates previously and what other contributors have estimated and ingest.

Are there any additional charges for ingested materials?

There are no charges above the yearly partnership fee for materials deposited by partner institutions (we no longer charge a per GB fee for ingest). All members share in the cost of maintaining public domain content, and maintenance costs of in-copyright content is shared among partners who have indicated via their print holdings data that they hold those volumes in their collection.

When is a new namespace needed?

Namespaces are used to avoid clashes between identifiers. Whenever a new identifier scheme will be used, a new namespace is needed. Information about preferred identifiers is available in our Deposit Guidelines.

When is a new DASI needed?

A new Digital Assets Submission Inventory (DASI) is needed whenever the responsible entity, content provider, or digitization source is different from previous submissions and/or when the number of volumes or time period covered by the DASI have been exceeded. We prefer that DASIs cover only a one year period and cover multiple batches as opposed to receive a separate DASI for each batch.

When is a new administrative coversheet needed?

A new administrative coversheet is needed when the data previously provided in the coversheet changes. Typical events that prompt submission of a new administrative coversheet include: migration to a new bibliographic management system, use of a new identifier scheme, and ingest of content from a new digitization source.

We have been working with an institution that isn’t a HathiTrust member to scan content. Can we include that content in HathiTrust?

Yes, you can submit that content into HathiTrust. On the Digital Assets Submission Inventory, please indicate the HathiTrust partner in the “depositing institution” field and the name of the non-HathiTrust partner in the “content provider” field.

Can I submit content for preservation in a “dark archive,” i.e., content that users cannot access in any way, including via search?

Currently, no such capacity exists. All content stored in HathiTrust is, at the least, accessible through the full-text search feature.

Will you notify me when content is ingested? How can I keep track of what has been successfully ingested and what has failed?

We try to communicate with project leads as things progress, and we will certainly be in touch when problems occur; however, for larger projects with large amounts of content or for Internet Archive or Google content where ingest is automated, you may find it easier to track this yourself. We provided two kinds of ingest reports. Our ingest reports provide weekly overviews of ingest activities, and each report is broken down by partner. The ingest logs provide item-specific reports, also available for each week of activity. The ingest logs include the item identifier and whether ingest succeeded or failed.

How long will it take for my locally-digitized content to become available to users?

This is dependent on a number of factors. Ingest of the digital objects cannot begin until the bib records have been made available to repository systems. Once you submit your bib data, it takes 2 days for the records to be loaded to Zephir and then exported and added to the HathiTrust catalog. Bib data should be submitted before digital content is submitted. Once we have received the digital content, it takes time to remediate and package content for ingest into the repository. If there are significant problems with the content, we may send it back to you for additional work on your part. Other ingest activities are typically going on at the same time and take up staff and machine bandwidth, as well. Ingest of volumes occurs overnight, and there is a limit on the number of volumes that can be ingested in one night.

Do you accept that we send you the digital access file only, without the master TIFF?

At this time we are only accepting master files in TIFF ITU G4 or JPEG2000 format. Our general practice is to compress continuous tone TIFF files into JPEG2000. Specific image requirements are described in this document, which includes the process that we have for receiving, validating, and packaging for ingest volumes from partner institutions. The document is linked to from http://www.hathitrust.org/ingest_tools, which has more information about the ingest tools we make available. These pages will also likely be helpful:

Is the derivative Unicode OCR text required?

OCR is required where it is possible to be generated. We recognize that OCR is infeasible or impossible for some materials (e.g., handwritten manuscripts).

Can you accept JPEG2000 images that have been compressed with lossless compression?

Because of the large size of lossless JP2s, we typically do not ingest these types of files without first verifying the reason why lossless compression is desired. This may be because there is something in particular about the materials that is particularly special, e.g., they are rare books or special collections materials, where it is important to preserve the artifactual elements of the content. For general collections materials where there is no particular reason to capture the artifactual elements, our general practice based on research led by Harvard and with considerations of file size and storage is to use a certain compression rate for images. We understand that you would like the highest possible quality files preserved. Based on our preservation practices, however, which consider matching the fitness of preserved files for the intended uses, we would like to understand if there are any particular aspects of these files that set them apart from other general collections materials, where we have identified an appropriate compression rate that maintains quality while being sensitive to the resources expended for preservation.

Which formats and files do you ingest from Internet Archive?

A description of the formats that we download and our rationale is available in this document.

We have digitized materials locally/through a vendor (or we have born-digital materials) and uploaded them to Internet Archive. Can you ingest them?

Internet Archive is just the delivery mechanism in this case, and often materials that were not digitized directly by Internet Archive may not meet our requirements and will be rejected. Prior to beginning ingest, please provide a few samples to typical items by linking us to the items in Internet Archive. We will inspect our preferred file formats to ensure they meet our specs. If other formats are desired for ingest, they should meet our specifications at http://bit.ly/1jboMIC, and it’s possible that direct ingest may be preferable.