Hathifiles include information derived from the bibliographic record (e.g., title, publisher, language, commonly used identifiers, etc.), rights and access codes, and information about the source of the item.
A description of the fields included in the hathifiles as well as potential use cases is provided in the “Hathifiles Description” page.
Files provided below
A monthly file is uploaded on the first of every month with a row for every item that is in the HathiTrust collection at the moment the file is created. The filename begins with “hathi_full_”. These files tend to be large and may be difficult to open with standard spreadsheet software or text editors. You may need to work with the files programmatically (e.g., using Python to extract desired data).
An update file is uploaded every day and contains a row for every item that has changed in the previous 24 hours. The filename begins with “hathi_upd_”. Items are included in the update files if any of the following has occurred: the item was newly deposited into the collection, a new copy of the digital item overrode the previous copy, the rights and access status has changed, or a new bibliographic record was provided by the contributor.
A “header” file is also included below. This file contains one row of labels for the data elements included in the hathifiles. It can be combined with the regular hathifiles for ease of working with the data. This header file is only updated when a new data element is added to the hathifiles.