HathiTrust Policy on Metadata Sharing and Use Recommended by Metadata Policy, Strategy, Use and Sharing Advisory Group, March 2019; approved by Program Steering Committee, April 4, 2019; and approved by Board of Governors, June 3, 2019

Sub-Policies Regarding the Sharing of Specific Types of Metadata Updated between 2021 and 2022

Metadata Sharing: Please note that under HathiTrust Digital Library’s (HTDL) Metadata Sharing Policy, independent users, member institutions, and other third parties are free to harvest (for example, through our OAI feed or the HathiFiles), modify and/or otherwise make use of any metadata contained in HTDL unless restricted by contractual obligations residing with the parties that have contributed the metadata (“Depositing Institutions”) to HTDL. HathiTrust cannot be aware of every contractual obligation incumbent upon Depositing Institutions and as such makes no warranties on data made available through any of its several sharing mechanisms.

HathiTrust Policy on Metadata Sharing and Use

POLICY STATEMENT 

To encourage discovery of materials, research, and for a variety of current and future uses, HathiTrust provides open access to HathiTrust-collected, -managed and -generated metadata (“HathiTrust metadata”, hereafter) where it has been determined that legal, contractual, policy, ethical, practical, and strategic considerations allow. 

For specific types of metadata, HathiTrust considers relevant factors in making policy determinations about metadata sharing. A list of these types of metadata appears at the end of this document. 

For those types of HathiTrust metadata which HathiTrust openly shares, HathiTrust: 

  • Asserts no additional intellectual property rights and expressly waives rights that it may have with respect to the metadata, including rights arising from its own actions with and upon the metadata. To the extent that HathiTrust’s own contributions in selecting, modifying, correcting, enhancing, and arranging the metadata may be protected by copyright, HathiTrust dedicates such contributions to the public domain pursuant to a Creative Commons 0 (CC0) Public Domain Dedication; 
  • To the extent that members’ and contributors’ contributions in selecting, modifying, correcting, enhancing, and arranging the metadata may be protected by copyright, HathiTrust requires that the members and contributors dedicate such contributions to the public domain pursuant to a Creative Commons 0 (CC0) Public Domain Dedication; 
  • Allows HathiTrust users to access, download, collect, modify, and/or otherwise use it, noting that any use of HathiTrust metadata must conform to all applicable laws and regulations in a given jurisdiction and applicable contractual restrictions; and 
  • Requests, as a matter of courtesy and scholarship, that metadata openly provided by HathiTrust be given attribution when feasible and in accord with community norms. 

Further, it is noted that:

  • HathiTrust exerts responsible stewardship over the metadata it receives and manages.
  • HathiTrust does not intend nor claim to be the repository of record for all metadata it manages and shares;
  • HathiTrust will apply a principle of openness as part of future calls for metadata contributions for ongoing operations and for special collaborative programs, and communicate that principle clearly to contributors and members; 
  • Metadata obtained from HathiTrust through this policy is not static and is subject to change. The metadata may be updated at any time in accordance with HathiTrust policies on metadata maintenance; 
  • HathiTrust makes no representations or warranties of any kind concerning the metadata.
  • HathiTrust offers the metadata as-is. Users use the metadata provided at their own risk; and
  • HathiTrust communicates the terms of the Policy on its website, in its Agreements, and in additional policies and other contexts where HathiTrust has a relationship with metadata contributors and metadata users. 

REASON FOR POLICY 

The HathiTrust mission is to contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge. HathiTrust metadata, in its many types, are valuable and instrumental assets for the membership and the larger public as a common good. This policy provides a framework for sharing and using HathiTrust metadata in alignment with this mission.

This policy is grounded in the beliefs that: 

  • The direct sharing of metadata by HathiTrust members and contributors in collaboration with HathiTrust supports the HathiTrust mission and is undertaken for the common good; 
  • The vast majority of metadata contributed to and managed by HathiTrust is not subject to copyright protection because it either expresses only objective facts, or constitutes expression so limited by the number of ways the underlying ideas can be expressed that such expression has merged with those ideas. Facts and ideas may not be copyrighted; and, therefore, 
  • Wherever legal, contractual, policy, ethical, practical, strategic considerations and adherence to copyright laws allow, HathiTrust endeavors to make its metadata openly available for sharing in human-understandable and machine-processable forms suitable for unfettered use. 

RELATED POLICIES 

Sub-Policies Regarding the Sharing of Specific Types of Metadata

The list below is indicative of the types of metadata that are collected and/or managed by HathiTrust. Each metadata type may have specific legal, contractual, policy, ethical, practical, and/or strategic considerations in regard to sharing which must be addressed. As sub-policy statements regarding the sharing of specific metadata types are developed, documented, and adopted, they will be publicly communicated on the HathiTrust website alongside this document. 

Bibliographic Metadata Sharing Policy

Last updated October 7, 2021

This policy is a sub-policy of the HathiTrust Policy on Metadata Sharing and Use. It is intended to address the sharing of bibliographic metadata collected, managed, and generated by HathiTrust. In accordance with the HathiTrust Policy on Metadata Sharing and Use, to the extent HathiTrust owns any copyright in bibliographic metadata, that metadata will be shared under a CC0 1.0 Universal Public Domain Dedication.

When sharing bibliographic metadata under this sub-policy, HathiTrust will provide a link to this sub-policy.

Categories of Bibliographic Metadata

Bibliographic Metadata

Bibliographic metadata is metadata contributed by HathiTrust members at the time of content ingest that describes print versions of digitized items in the HathiTrust collection. Bibliographic metadata submitted by members meets or exceeds HathiTrust’s Bibliographic Metadata Specifications and is managed in Zephir, the HathiTrust metadata management system. During bibliographic metadata ingest and processing, HathiTrust validates, normalizes, puts into consistent locations, removes, and/or adds some metadata values contributed by members. 

Bibliographic metadata is used in the HathiTrust catalog and shared through the Hathifiles, Bibliographic API, OAI feed, and Collection Builder exports. Bibliographic metadata is also part of the datasets shared with researchers via rsync.

For the purposes of this sub-policy, bibliographic metadata does not include rights-related metadata, which is covered by the Rights-Related Metadata Sharing Policy (below). 

Bibliographic Metadata for U.S. Federal Government Documents

The HathiTrust Policy on Metadata Sharing and Use refers to bibliographic metadata for federal government documents separately from other bibliographic metadata. Both types of bibliographic metadata are covered under this policy.

Exceptions to Sharing

None

Community Norms

HathiTrust requests that you review and act in accordance with the following HathiTrust community norms with respect to your use of the Metadata Records:

  1. In accordance with HathiTrust Policy on Metadata Sharing and Use, HathiTrust requests, “as a matter of courtesy and scholarship, that metadata openly provided by HathiTrust be given attribution” to its creator or source when feasible.
  2. With respect to Metadata Records consisting of or contained in records HathiTrust or its members has obtained from the OCLC WorldCat database, HathiTrust requests that you respect and act in accordance with the community norms set forth in the WorldCat Rights and Responsibilities for the OCLC Cooperative

By observing these community norms, you will be helping to promote good practices, foster trust among partners, and encourage growth of the open metadata community.

 

Rights-Related Metadata Sharing Policy

Last updated September 8, 2021

This policy is a sub-policy of the HathiTrust Policy on Metadata Sharing and Use. It is intended to address the sharing of metadata related to the copyright status of collection volumes collected, managed, and generated by HathiTrust. The rights-related metadata is stored in multiple locations within HathiTrust’s data model, often cross-walked to different databases and disseminated across different access points. The rights-related metadata falls within three categories defined by the purpose of the metadata: rights metadata, Copyright Review Program Rights Investigation data, and market availability data. 

Sharing of associated bibliographic metadata that appears in the Rights Database and in the Copyright Review Management System (CRMS) is governed by the Bibliographic Metadata Sharing Policy (above).

Categories of Rights-Related Metadata

Rights Metadata

The rights metadata stored by HathiTrust is data relevant to the copyright status of a collection volume. This data is, for the most part, stored within HathiTrust’s rights database. The rights database collects and stores metadata from CRMS and the HathiTrust Metadata Management system (Zephir). The data in the rights database is used by HathiTrust to programmatically set access levels to volumes.

Copyright Review Program Rights Investigation Metadata

This metadata is created during the copyright review process. It is stored within CRMS and used to determine the copyright status of volumes as well as to manage and document the review process.

Market Availability Metadata

Unlike the rights metadata that describes the current status and use of a volume, the market availability metadata describes whether a title is available for sale. This data was generated as part of a pilot program to make copies of damaged, deteriorating, lost or stolen items available to authorized users in accordance with Section 108 of the United States Copyright Act. Although the exact metadata fields involved were not available when this policy was written, this policy considers the likely fields required for such work. 

Metadata fields used in the three categories above are listed in Types of Rights-Related Metadata (below). The Policy on Metadata Sharing and Use states that “HathiTrust provides open access to HathiTrust-collected, -managed and -generated metadata . . . where it has been determined that legal, contractual, policy, ethical, practical, and strategic considerations allow.” Fields from this list that should not be shared are identified later in this policy. The principles of this policy can also be applied to sharing of future rights-related fields. Fields not restricted by the considerations below can be shared openly.

Exceptions to Sharing

Sensitive data about copyright determinations

Some copyright determination metadata is sensitive and should not be shared due to legal considerations, including litigation risk. Current fields that should not be shared for this reason are “Note,” “Note Category,” and “Ticket” from the CRMS database.

Preserving intellectual privacy of CRMS reviewers 

Staff and volunteer notes about internal processes should not be shared due to practical and ethical considerations. Sharing such notes would have a chilling effect on recordkeeping.

A current field that should not be shared for this reason is the “Note” field from the CRMS database.

Preserving privacy of individuals

Personally identifiable information about CRMS reviewers and other individuals should not be shared due to privacy (ethical) considerations. Current fields that should not be shared for this reason are the “Reviewer” field from the CRMS database and any personally identifying information in Market Availability Data, such as the name of a reviewer. The “Reviewer” field in CRMS contains email addresses; if these were replaced with anonymous identifiers (e.g., numbers), this anonymized data could be shared.

Types of Rights-Related Metadata

Last updated August 20, 2021

This list of rights-related metadata was prepared by the HathiTrust Metadata Sharing Policy Task Force rights subgroup with help from HathiTrust staff in spring and summer 2021. It reflects the group’s understanding at the time the Rights-Related Metadata Sharing Policy (above) was drafted.

Rights Metadata

HathiTrust Rights Database provides general information about this type of metadata.

Rights metadata is stored in the Rights Database and published MARC records. It’s available in the Hathifiles, bibliographic API, and OAI feed.

Rights Database fields

  • Attribute (characterizes the copyright status of the volume, e.g., “pd”) 
    • MARC 974$r
  • Reason (accounts for why the volume was given that copyright status, e.g., “bib”)
    • MARC 974$q
  • Source (digitizing agent for volume)
    • MARC 974$s
  • Access Profile (type of access provided in page-turner application, e.g., “open,” “google,” “page,” or “page+lowres”) (in HathiFiles: “access_profile_code”)

In published MARC records

  • 974$r: Attribute (see above)
  • 974$q: Reason (see above)
  • 974$s: Source (see above)
  • 974$t: Explanation of automatic copyright determinations based on bibliographic record, e.g., “US bib date1 < 1926,” along with “$r pd $q bib.”

In Hathifiles (rights-related fields only) (field descriptions)

  • access
  • rights
  • source (This is the source of the bibliographic data, not the same as the “source” field in Rights Database or MARC records.)
  • Rights_reason_code
  • Rights_timestamp
  • Us_gov_doc_flag
  • Rights_date_used
  • pub_place
  • Digitization_agent_code
  • Access_profile_code

How this is currently available:

  • In MARC records (field 974), which are public
  • Bibliographic API (“rightsCode” and “usRightsString”)
  • The code in the MARC 974$r is also used to determine the contents of the OAI feed; anything pd or pdus is included. That information is contained in the MARC 856$r.
  • Hathifiles

Copyright Review Program Rights Investigation Data

The following metadata are stored within the Copyright Review Management System (CRMS). They are not currently publicly available.

Metadata about volumes

This bibliographic data is imported from Zephir at the time the item is reviewed. It is cached and is only updated if the item is brought back later for rereview.

  • Identifier (HTID for the volume)
  • System ID
  • Title
  • Author
  • Pub Date
  • Country

Metadata about reviews

This data is generated within CRMS.

  • Review Date
  • Export Date
    • Date on which the rights change was sent to the Rights Database as a recommended rights change
  • Status
    • Numeric workflow routing code indicating how far a volume has progressed through the review process
    • This field does not indicate copyright status.
  • Legacy
    • If flag is present, indicates review was conducted before CRMS was built, then imported into CRMS.
  • Reviewer
    • Identifies the reviewer
    • Expert reviewers have two accounts and thus two different reviewer identifiers.
  • Expert
    • If flag is present, the review comes from an expert reviewer.
  • Attribute
    • Characterizes the copyright status of the volume, e.g., “pd”
    • Same as “Attribute” field in Rights Database and MARC 974$r (see above)
  • Reason
    • Accounts for why the volume was given that copyright status, e.g., “bib”
    • Same as “Reason” field in Rights Database and MARC 974$q (see above)
  • Note Category 
    • Controlled vocabulary of categories describing “Note” field
  • Note
    • Free text field documenting reasoning behind the reviewer’s determination
  • Priority
    • Number indicating the priority of item in queue to be served up for a review
  • Verdict
    • System’s verdict about the review (e.g., validation, invalidation, or neutral)
  • Swiss
    • If flag is present, “neutral” verdict is possible.
    • This field exists for expert reviews only.
  • Project
    • Indicates inclusion of item in a larger project (e.g., State Doc Review project)
  • Ticket
    • User support ticket number
  • Author or organization
    • Free text entry of entity that has authorized a license to HathiTrust

Market Availability Data


Data about the availability of unused replacement copies would support a service providing replacement copies under Section 108(c) of the U.S. Copyright Act. The HathiTrust Metadata Sharing Policy Task Force based its work in this area on hypothetical metadata fields.

Hypothetical fields:

  • Condition of local member institution copy
  • Market availability review
    • Date review was conducted
    • Reviewer
    • Reason/where reviewer checked
    • Outcome (binary field indicating market availability)

Print Holdings Metadata Sharing Policy

Last updated January 21, 2022.

This policy is a sub-policy of the HathiTrust Policy on Metadata Sharing and Use. It is intended to address the sharing of print holdings metadata. Print holdings metadata is supplied by contributing institutions. Metadata categories include:

  • Item and resource identifiers
  • Holdings status (held, missing, lost)
  • Condition
  • Enumeration and chronology
  • Government document indicator

Exceptions to Sharing

None

Shared Print Retention Metadata Sharing Policy

Last updated February 18, 2022.

This policy is a sub-policy of the HathiTrust Policy on Metadata Sharing and Use. It is intended to address the sharing of metadata related to retention commitments made by HathiTrust Retention Libraries through the Shared Print Program. Retention Libraries identify the print holdings they are willing to retain using a process and metadata requirements defined by HathiTrust. HathiTrust maintains metadata about these retention commitments in a Shared Print Registry.

Unlike most metadata covered by the HathiTrust Policy on Metadata Sharing and Use, which primarily describes digital objects in the HathiTrust Digital Repository, the shared print metadata describes some specific characteristics of member library print collections. Shared print data is of importance to the library community and beyond in order to make more informed collection decisions.

Shared print retention metadata is supplied by the contributing institution. Metadata categories include:

  • Institutional identifiers
  • Item and resource identifiers
  • Item shelving type and shelf location
  • Item lending and scanning policies
  • Other retention commitments
  • Item ownership history

Exceptions to Sharing

None

METS Metadata Sharing Policy

Last updated March 22, 2022

This policy is a sub-policy of the HathiTrust Policy on Metadata Sharing and Use. It is intended to address the sharing of data stored in files conforming to the Metadata Encoding and Transmission Standard (“METS”) collected, managed and generated by HathiTrust. Sharing of associated bibliographic metadata that appears in the METS file is governed by the Bibliographic Metadata Sharing Policy (above).

Description of Data

There are generally two types of METS files collected and stored by HathiTrust. The first are “Source” METS files. “Source” METS files are either created by Google and sent to HathiTrust as part of the Archival Information Package, or are generated at the time of ingest by HathiTrust based on other metadata coming from the digitization source. “Source” METS files contain information about the content, including bibliographic and administrative data.

HathiTrust generates the second kind of METS file, the “HathiTrust” METS file. This file includes a subset of the “Source” METS file data, as well as other data generated by HathiTrust in its stewardship of items in its corpus.

HathiTrust provides more information about the content of these METS files.

Exceptions to Sharing

None

Collection Builder Metadata Sharing Policy

Last updated March 17, 2022

This policy is a sub-policy of the HathiTrust Policy on Metadata Sharing and Use. It is intended to address the sharing of metadata related to the Collection Builder metadata generated by HathiTrust and its users and collected, managed and stored by HathiTrust.

Collection Builder is a tool that allows users to create their own subsets of the HathiTrust repository to search and share with others, or be kept privately. Librarians, faculty, students and researchers use Collection Builder to aggregate sources around a particular subject for classes, group projects, specialized library resources, or personal use. 

Collection Builder Metadata

When users create subsets of the HathiTrust corpus using Collection Builder, users create titles and descriptions of the subsets, and their HathiTrust username is attached to the subset. Users can elect to make those subsets available to the public.

The sharing of bibliographic data stored in the subset data is covered by the Bibliographic Metadata Sharing Policy (above). This policy addresses the user-created titles, descriptions, selection and arrangement of items for the subset.

Exceptions to Sharing

Subsets marked as “Private”

If subsets have been marked as private, they are not intended to be shared publicly and should not, therefore, be shared, even if those subsets were made available under a Creative Commons Public Domain Dedication previously (for example, a user changes their subset sharing from public to private).

HathiTrust Research Center Workset Metadata Sharing Policy

Last updated March 17, 2022

This policy is a sub-policy of the HathiTrust Policy on Metadata Sharing and Use. It is intended to address the sharing of metadata related to the Workset metadata generated by HathiTrust Research Center (“HTRC”) and its users of the HTRC Workset Builder. The Workset metadata is collected, managed and stored by HTRC.  

The HTRC Workset Builder is a beta tool and interface built over the HTRC Extracted Features Dataset to enable both volume-level metadata search and volume- and page-level unigram (single word) text search of the extracted features in order to build worksets. Worksets can be public (viewable by users signed-in to HTRC Analytics) or private (viewable only to you). Further information about the Workset Builder tool can be found in Workset Builder 2.0 for Extracted Features 2.0: Search the Extracted Features Dataset.

Description of Data

When users create worksets using HTRC Workset Builder, users provide titles and descriptions of the worksets. As a result, the metadata associated with a workset is the workset title and description, a list of volume IDs, and the workset creator’s HathiTrust username or alias name if it exists.  

Exceptions to Sharing

Private worksets should not be shared.

HathiTrust Research Center Extracted Features Metadata Sharing Policy

Last updated February 10, 2022

This policy is a sub-policy of the HathiTrust Policy on Metadata Sharing and Use. It is intended to address the sharing of metadata related to the HathiTrust Research Center’s Extracted Features metadata collected, managed and generated by HathiTrust Research Center. This policy will apply to any versions of Extracted Features beginning with version 2.0.

Sharing of associated bibliographic metadata that appears in the Extracted Features Dataset is governed by the Bibliographic Metadata Sharing Policy (above) and the Rights-Related Metadata Sharing Policy (above). The policies are intended not to conflict.

Description of Data

HTRC Extracted Features (EF) datasets consist of metadata and derived data elements that have been extracted from volumes in the HathiTrust Digital Library. The dataset is periodically updated, including adding new volumes and adjusting the file schema.

This policy specifically addresses data that is calculated or algorithmically-derived. For example, page level metadata, part-of-speech tagged term token counts, header/footer identification, marginal character counts, calculated language, among other things. 

Exceptions to Sharing

None

Top