HathiTrust makes the texts of public domain works in its corpus available for research purposes. The works fall into two categories: non-Google-digitized volumes, which are freely available, and Google-digitized volumes, which are available through an agreement with Google. These sets are described further below. Please contact hathitrust-datasets@umich.edu [1] with questions and inquiries.
Approximately 350,000 public domain volumes as of February 2013, primarily, though not exclusively, English language materials published prior to 1923.
Approximately 2.8 million public domain volumes as of February 2013, representing a wide variety of languages, subjects, and dates. See the visualizations [5] of HathiTrust public domain volumes.
Access and Use
These volumes were digitized by Google and are available through an agreement with Google [6] that must be signed on the behalf of researchers by an institutional sponsor (someone with appropriate signing authority at a researcher's institution). In general, the limits on use of these materials are as follows:
To begin the process of receiving texts (or making them available to researchers), researchers or institutional representatives should:
When you submit your proposal, please indicate whether or not you give permission to share the proposal publicly.
The following institutions have signed, or are in the process of signing an agreement with Google for use of texts. If you are a researcher affiliated with one of these institutions, you may proceed directly to submitting the proposal and we will be in touch with your institutional sponsor.
If you are not interested in receiving the entire set of non-Google-digitized or Google-digitized materials, we can assist in the creation of custom datasets. To do so, we need a list of the volume ids of desired volumes. Volume ids are present in the persistent identifiers for HathiTrust volumes (e.g., http://hdl.handle.net/2027/mdp.39015021715670 [7]). IDs can be retrieved in the following ways:
Please send lists of ids or queries to hathitrust-datasets@umich.edu [1].
Links:
[1] mailto:hathitrust-datasets@umich.edu
[2] http://www.hathitrust.org/data_api
[3] http://www.google.com/url?q=https://dev.www.lib.umich.edu/hathitrust/non_google_pd_pdus.zip&sa=D&sntz=1&usg=AFQjCNHCkaYw9ojiFHkAI3YrC87CTj_tVA
[4] http://www.hathitrust.org/documents/non_google_pd_pdus.zip
[5] http://www.hathitrust.org/statistics_visualizations
[6] http://www.hathitrust.org/documents/Google_PD_Distribution_Agreement_template.docx
[7] http://hdl.handle.net/2027/mdp.39015021715670
[8] http://www.hathitrust.org/home
[9] http://babel.hathitrust.org/cgi/mb?a=listcs;colltype=pub
[10] http://www.hathitrust.org/hathifiles
[11] http://www.hathitrust.org/hathifiles_description
[12] http://www.google.com/url?q=https://dev.www.lib.umich.edu/hathitrust/data_api&sa=D&sntz=1&usg=AFQjCNEUqfSkIs24Yy4hyYShDd-vpvMOXg
[13] http://www.lib.umich.edu/two-over-threehundred