The “hathifiles” are a standard metadata format we use at HathiTrust to distribute information about items in the HathiTrust collection. They include information derived from the bibliographic record (e.g., title, publisher, language, commonly used identifiers, etc.), rights and access codes, and information about the source of the item.
Many of the fields described below are extracted from the MARC record, a commonly-used format in library catalogs to describe a work. Information is provided below about which subfields are included in the data element. See “What is a MARC record, and why is it important?” to learn more about the MARC format.
Multiple occurrences of OCLC number, ISBN, ISSN, or LCCN are comma-delimited within the appropriate data element.
If there is no corresponding data for a field, the field will be empty.
The fields are provided in the hathifiles in the order described below.
When new elements are added to the hathifiles, they are added to the end of the row.
|Data Element||Field Name in Header File||Description|
|Volume Identifier||htid||This is the permanent HathiTrust item identifier. Each item identifier is unique. This identifier can be used to construct a persistent handle url or other link that directs users to the item. Handles can be constructed as follows: https://hdl.handle.net/2027/volume_identifier For example: https://hdl.handle.net/2027/mdp.39015013764785|
|Access||access||An access code that describes whether or not users can view the item. The access code is derived from the rights attribute. Permitted values include: allow - end users can view the item deny - end users cannot view the item Notes: Items with a copyright status of “public domain in the United States” (i.e., only users within the United States can view the item) have the value of “allow”. Items with a copyright status of “in-copyright in the United Status” (i.e., only users outside the United States can view the item) have the value of “allow”. Also see “Rights” and “Access Profile” data elements below.|
|Rights code||rights||LINK TO RIGHTS DATABASE ATTRIBUTESA code (also referred to as “rights attribute”) that describes the copyright status, license or access. See the full list of codes.|
|HathiTrust record number||ht_bib_key||HathiTrust's record number for the associated bibliographic record. HathiTrust record numbers are not permanent and can change over time. URLs to HathiTrust catalog records can be constructed as follows: https://catalog.hathitrust.org/Record/record_number For example: https://catalog.hathitrust.org/Record/001285647|
|Enumeration/Chronology||description||Enumeration (e.g., “vol.1”) and chronology (e.g., “1883”, “Jun-Oct 1927”) data for this item.|
|Source||source||Code identifying the source of the bibliographic record. Currently, the NUC code of the originating library is used for the code.|
|Source institution record number||source_bib_num||Local bibliographic record number used in the catalog of the library that contributed the item.|
|OCLC numbers||oclc_num||OCLC number(s) for the bibliographic record. Multiple values are separated by a comma.|
|ISBNs||isbn||ISBN(s) for the bibliographic record. Multiple values are separated by a comma.|
|ISSNs||issn||ISSN(s) for the bibliographic record. Multiple values are separated by a comma.|
|LCCNs||lccn||LCCN(s) for the bibliographic record. Multiple values are separated by a comma.|
|Title||title||The title of the work. May include an author if provided in the MARC field 245 $c. Includes all subfields of the 245 MARC field.|
|Publishing information||imprint||The name of the publisher and the date of publication. Includes subfieds b and c of the 260 MARC field.|
|Rights determination reason code||rights_reason_code||This code describes how the “Rights” code was set. See the full list of Reason Codes.|
|Date of last update||rights_timestamp||This date may change when any of the following activities occur: the item was newly deposited into the collection a new copy of the digital item overrode the previous copy the rights and access status has changed a new bibliographic record was provided by the contributor|
|Government Document||us_gov_doc_flag||United States federal government document indicator. Permitted values include: 1- the item is a US federal government document 0 - the item is not a US federal government document|
|Publication Date||rights_date_used||Derived publication date of the item. The date is derived from data provided in the 008 field of the MARC record and the enumeration/chronology data for the item. In cases where the date of the item could not be easily determined by HathiTrust processes, the date will be listed in the hathifiles as 9999.|
|Publication Place||pub_place||The place of publication for the work. The codes included in this data element were originally provided in bytes 15-17 of the 008 MARC field. See the full list of country codes in the “MARC Code List for Countries.”|
|Language||lang||The primary language of the work. The codes included in this data element were originally provided in bytes 35-37 of the 008 MARC field. See the full list of language codes in the “MARC code list for Languages.”|
|Bibliograhic Format||bib_fmt||Bibliographic format of the work. Definitions of format values can be found on the Library of Congress website Permitted values include: BK - monographic book SE - serial, continuing resources (e.g., journals, newspapers, periodicals) CF - computer files and electronic resources MP - maps, including atlases and sheet maps MU - music, including sheet music VM - visual material MX - mixed materials|
|Collection Code||collection_code||An administrative code used to share information between Zephir and HathiTrust repository.*|
|Content Provider Code||content_provider_code||The institution that originally contributed the content. Codes used are listed at https://www.hathitrust.org/institution_identifiers.*|
|Responsible Entity Code||responsible_entity_code||The institution that took responsibility for accessioning the content into HathiTrust, in cases where the content provider was not a member of HathiTrust. Codes used are listed at https://www.hathitrust.org/institution_identifiers.*|
|Digitization Source||digitization_agent_code||The organization that digitized the content. Codes used are listed at https://www.hathitrust.org/rights_database#Sources.*|
|Access profile||access_profile_code||Access profiles indicate whether an item has view or download restrictions. They work in combination with the rights codes (included in the hathifiles in data element “rights”) to determine user access. Permitted values include: open - Items with this value do not have any download restrictions. google - Items with this value have some download restrictions. Any user anywhere can download one page at a time. Member-affiliated users can download the full pdf. page - Items with this value can be viewed on the HathiTrust website. Users can download individual pages but cannot download the full pdf, regardless of member affiliation. page+lowres - Users can download the item in a lower resolution with a watermark only.|
|Author||author||The name of the person, company or meeting that created the work. Author names are typically in authorized format, meaning that the name is provided in a standardized form used across multiple catalogs and databases. Includes the following fields from the MARC record: 100 $a $b $c $d - Name of the person who authored the work 110 $a $b $c $d - Name of a corporation or organization that authored the work 111 $a $c $d - Name of a meeting or conference that is responsible for creating the work|
*For more information about codes used in HathiTrust internal processes, see the page at https://www.hathitrust.org/internal_codes.