Navigation

Data Availability and APIs

HathiTrust distributes information about items in the repository (and items themselves where possible) through a variety of mechanisms.

Datasets

It is possible to obtain datasets (full-text OCR) of public domain works in HathiTrust. More information can be found at http://www.hathitrust.org/datasets.

APIs

Bibliographic API

The Bib API returns bibliographic, copyright, and volume information (including permanent URLs) when queried with a variety of standard identifiers (e.g., ISBN, LCCN, OCLC, etc.). The API has controls to return brief or full bibliographic metadata.

Data (page images, OCR text, and associated metadata)

HathiTrust has developed a Data API that makes it possible to retrieve page images, OCR text, rights information, and a variety of other data about objects in the repository. A draft specification for the API has been made available for comment from the HathiTrust partners. Please read the most recent Monthly Update for current status information.

OAI

The University of Michigan provides an OAI feed of MARC21 and unqualified Dublin Core records for public domain materials in HathiTrust  (see http://www.lib.umich.edu/michigan-digitization-project-oai-harvesting for information about the Open Archives Initiative at the University of Michigan and the UM OAI toolkit for harvesting records).

These records can be harvested through the following URLs:
http://quod.lib.umich.edu/cgi/o/oai/oai?verb=ListRecords&metadataPrefix=marc21&set=hathitrust
http://quod.lib.umich.edu/cgi/o/oai/oai?verb=ListRecords&metadataPrefix=oai_dc&set=hathitrust

In place of "set=hathitrust" at the end of the URLs above, use "set=hathitrust:pdus" to access materials that are public domain in the United States only and "set=hathitrust:pd" to access materials that are in the public domain worldwide.

Tab-delimited Files (Hathifiles)

Metadata identifying the contents of HathiTrust repository are available for download as tab-delimited files. These files include a small number of bibliographic elements to aid an institution in making decisions as to records they want to retrieve. That is, the metadata made available here are a tool that can be used to help obtain records and add links to existing records in local systems. Full documentation on these metadata is available under HathiTrust Metadata. Using the metadata described above, an institution may acquire records through one of the following methods:

  1. The OCLC identifier can be used to retrieve records either via Connexion or from the OCLC z39.50 server using USE attribute 12.
  2. The source institution's record number can be used in obtaining records directly from that institution. Contact the source institution directly for further information about access to their data.

The Bib API can be used in conjunction with these metadata for purposes such as including records in a local catalog, though it is meant for use on a small scale (see the Bib API page for details). It provides rights and permanent URL information about each volume, in addition to bibliographic information.

HathiTrust and OCLC records

OCLC and the HathiTrust work together to synchronize WorldCat with the HathiTrust catalog nightly.  The vast majority of records representing the HathiTrust collection are in WorldCat today, with links to the HathiTrust content.