Data Availability and APIs

HathiTrust distributes information about items in the repository (and items themselves where possible) through a variety of mechanisms.


It is possible to obtain datasets (full-text OCR) of public domain works in HathiTrust. More information can be found at


Bibliographic API

The Bib API returns bibliographic, copyright, and volume information (including permanent URLs) when queried with a variety of standard identifiers (e.g., ISBN, LCCN, OCLC, etc.). The API has controls to return brief or full bibliographic metadata.

Data (page images, OCR text, and associated metadata)

HathiTrust has developed a Data API that makes it possible to retrieve page images, OCR text, rights information, and a variety of other data about objects in the repository. A draft specification for the API has been made available for comment from the HathiTrust partners. Please read the most recent Monthly Update for current status information.


The University of Michigan provides an OAI feed of MARC21 and unqualified Dublin Core records for public domain materials in HathiTrust. OAI (Open Archives Initiative) allows harvesters to access records made available using the OAI protocol. Harvesters using our OAI feed will access new records, updated records or discover if any records have been deleted. For best practices related to OAI, and a list of potential harvesters, see

This link describes our OAI repository.

These records can be harvested through the following URLs:

In place of "set=hathitrust" at the end of the URLs above, use "set=hathitrust:pdus" to access materials that are public domain in the United States only and "set=hathitrust:pd" to access materials that are in the public domain worldwide.

Tab-delimited Files (Hathifiles)

Metadata identifying the contents of HathiTrust repository are available for download as tab-delimited files. These files include a small number of bibliographic elements to aid an institution in making decisions as to records they want to retrieve. That is, the metadata made available here are a tool that can be used to help obtain records and add links to existing records in local systems. Full documentation on these metadata is available under HathiTrust Metadata. Using the metadata described above, an institution may acquire records through one of the following methods:

  1. The OCLC identifier can be used to retrieve records either via Connexion or from the OCLC z39.50 server using USE attribute 12.
  2. The source institution's record number can be used in obtaining records directly from that institution. Contact the source institution directly for further information about access to their data.

The Bib API can be used in conjunction with these metadata for purposes such as including records in a local catalog, though it is meant for use on a small scale (see the Bib API page for details). It provides rights and permanent URL information about each volume, in addition to bibliographic information.

Renewal ID file

A tab-delimited file containing US copyright renewal registration numbers in connection with a HathiTrust volume identifier is available for download. This data resulted from CRMS-US copyright reviews. In the context of CRMS historical copyright review data, a Renewal ID might represent a renewal registration for the exact edition, for a prior edition of the work published 1923 and later, or for partial content within the volume such as a short story. The data file for download is available at

For more information on the file see the page Renewal ID data file.

HathiTrust and OCLC records

OCLC and the HathiTrust work together to synchronize WorldCat with the HathiTrust catalog nightly.  The vast majority of records representing the HathiTrust collection are in WorldCat today, with links to the HathiTrust content.