[Download PDF [2]]
2012 brought to a close the initial 5-year charter period that HathiTrust was granted by its founding institutions. 5 years later, the collaborative is stronger than ever. More than 70 academic and research institutions from around the world participate in HathiTrust, supporting a digital repository of 10.6 million volumes and a host of shared activities, all geared toward the provision of greater access to the scholarly and cultural record, more secure preservation, and greater research opportunities for our constituencies than we have ever had before. As we launch into a new year, and a new stage of HathiTrust, it is worthwhile to reflect on our progress and achievements in 2012. These include:
A recap of activities in these areas and more can be read below.
About HathiTrust
HathiTrust is an international partnership of academic and research institutions dedicated to ensuring the preservation and accessibility of the vast record of human knowledge. The partnership owns and operates a digital repository containing millions of public domain and in copyright volumes, digitized from partnering institution libraries and other sources. The preserved volumes are made available in accordance with copyright law as a shared scholarly resource for students, faculty, and researchers at the partnering institutions, and as a public good to the world community. For more information, visit HathiTrust.org [3].
Details on each item can be found in the monthly updates from 2012, available at http://www.hathitrust.org/updates [4].
In a decisive victory for libraries and Fair Use, a lawsuit brought against HathiTrust and several participating libraries by the Authors Guild et al. was dismissed. Information [5] about the lawsuit, including responses and analysis from around the Web, can be found on the HathiTrust website.
HathiTrust grew from 66 to 78 partner institutions in 2012. New institutions include:
HathiTrust partners contributed 623,613 volumes to the repository in 2012. 566,044 of these are in the public domain. The University of Florida and Boston College were new contributors in 2012. Many others contributed additional content, as shown in the table near the end of the update.
Over the course of 2012, HathiTrust interacted with nearly a dozen institutions regarding ingest of locally-digitized content. We released a first iteration of ingest tools to aid institutions in validating and packaging locally-digitized content prior to submission to HathiTrust. We revised documentation [6] surrounding the tools based on feedback from institutions, and we also began to explore with institutions what the next iteration of the tools would look like. If you are using the tools now, think you might in the future, or are interested in more information, we encourage you to join our HathiTrust Ingest Google Group [7] to participate in discussions.
HathiTrust took bold steps in establishing a new governance model, seating a new Board of Governors [8], establishing an Executive Committee and Executive Committee officers, and drafting a set of bylaws. The bylaws will be put forward to the partnership for voting in early 2013.
The Collections Committee completed a report [9] on handling of duplicate volumes in HathiTrust, recommending that HathiTrust retain all duplicate copies for the time being, with periodic assessment.
The Communications Working Group released announcements related to HathiTrust’s achievement of 10 million volumes [10], the new Board of Governors [11], and the Authors Guild lawsuit [12]. The group also produced a new Resources [13] page for HathiTrust, launched a Pinterest [14] account, coordinated a survey of partners to receive input on the next iteration of partner training sessions, and, in collaboration with the UX Advisory Group, created a blog post on collections in HathiTrust [15].
The User Experience Advisory Group consulted on improvements to the HathiTrust PageTurner, including the addition of a version date for volumes, updated messages regarding download of PDFs, and a new landing page for volumes that are restricted from reading due to copyright, but are nevertheless full-text searchable. The UX Advisory Group also provided feedback on a new site-wide redesign currently in development.
The User Support Working Group submitted recommendations to the Board of Governors on User Support going forward from 2012. A summary of the User Support issues received in 2012 is given at the end of the review.
HathiTrust completed the first phase of improvements to enhance the accessibility of HathiTrust Web applications. With a few minor exceptions that will be addressed in the second phase, HathiTrust interfaces are now compliant with Web Content Accessibility Guidelines (WCAG) 2.0 [16], Level A.
HathiTrust accepted and released a new policy on bibliographic corrections: http://www.hathitrust.org/bib_metadata_correction [17].
HathiTrust initiated a project [18] to build a comprehensive registry of U.S. federal government documents.
HathiTrust began offering lawful access to digital copies of works that are out of print, when print copies owned by partner institutions are brittle or missing. More information is available at http://www.hathitrust.org/out-of-print-brittle [19].
HathiTrust made progress toward the migration of bibliographic data management from the University of Michigan to the California Digital Library’s Zephir system. Major activities in 2012 involved improving record loading processes in Zephir, syncing information between Zephir and other HathiTrust systems, exporting data from Zephir for use in the HathiTrust catalog and “hathifiles [20]”, development of new bibliographic metadata standards [21], development and testing of bibliographic record submission processes with current HathiTrust depositors, and progress toward a Zephir service-level agreement. Migration to Zephir is expected to occur in 2013.
In the early part of the year, the HTRC completed the agreements necessary to receive public domain data from the HathiTrust Repository. It also began to install systems for discovering, retrieving, correcting, and performing computation on OCR text of digital volumes. HTRC systems had their first public demonstration at an enthusiastic and widely successful HTRC “UnCamp” in September, attended by 130 researchers, developers, and librarians from HathiTrust member and non-member institutions. Resources from the UnCamp, including presentations, session materials, twitter analysis, and pictures, are available on the HTRC wiki [22]. A video produced from the event is available at http://www.hathitrust.org/ htrc [23].
The grant project team concluded all data gathering activities, including digital review of four 1,000-volume samples of volumes from HathiTrust, physical review of nearly all volumes in one sample and more than half of the volumes in a second sample (to investigate correlation between physical condition and digitization quality). Two of the samples underwent review more than once, as a new methodology was introduced to discover “whole-volume” errors such as missing and duplicate pages. In the coming months, as part of a no-cost extension, members of the team will conduct user studies to evaluate the results of the quality review performed the sampled volumes. Initial findings from studies undertaken in the grant can be found at the links below. More results will be posted on the project website [24] as analysis concludes and as articles containing the results are published throughout the coming months.
mPach is a system under development by the University of Michigan Library to publish open access born-digital journal content, along with accompany data and media files, directly into HathiTrust for perpetual access and preservation. Work in 2012 focused on refining the project’s design principles and requirements [29] and system architecture [30], establishing a timeline [31] for the project, and designing and developing mPach modules [32] and associated workflows to a) create archival XML in JATS format from DOCX files and b) deliver the resulting XML and supplementary files through HathiTrust applications.
Development in 2012 included the following:
All HathiTrust papers and presentation can be accessed at http://www.hathitrust.org/papers [41].
Copyright determinations conducted as part of CRMS-US [42] and CRMS-World [43].
| December | Overall | ||
Public Domain Determinations | All Determinations | Public Domain Determinations | All Determinations | |
CRMS-US | 41,268 | 79,817 | 119,822 | 219,874 |
CRMS-World | 13,445 | 23,519 | 14,202 | 28,795 |
Total | 54,713 | 103,336 | 135,777 | 248,669 |
As of January 1, 2013:
| December | Overall | |
| Boston College | 1,842 | 1,842 |
| Columbia University | 214 | 64,390 |
| Cornell University | 31,745 | 415,435 |
| Duke University | 1 | 4,523 |
| Harvard University | 182,545 | 235,985 |
| Indiana University | 8,161 | 195,073 |
| Library of Congress | 311 | 89,722 |
| North Carolina State University | 0 | 3,196 |
| Northwestern University | 7,073 | 12,722 |
| New York Public Library | 121 | 259,574 |
| Penn State University | 1,815 | 44,732 |
| Princeton University | 1,972 | 251,651 |
| Purdue University | 43,741 | 44,629 |
| Universidad Complutense | 3,233 | 111,901 |
| University of California | 95,601 | 3,383,255 |
| The University of Chicago | 16,112 | 26,720 |
| University of Florida | 2,008 | 2,008 |
| University of Illinois | 90,384 | 104,887 |
| University of Michigan | 105,235 | 4,609,836 |
| University of Minnesota | 13,973 | 104,212 |
| University of North Carolina, Chapel Hill | 1 | 8,088 |
| University of Wisconsin | 23,046 | 550,380 |
| University of Virginia | 3,403 | 50,799 |
| Utah State | 71 | 117 |
| Yale University | 4 | 23,678 |
| Total | 632,613 | 10,599,355 |
Public Domain (~31%)
| Total* | 566,044 | 3,278,630 |
| Issue Type | 2012 (Jan-Dec) | 2011 (Mar-Dec) |
| Content | 1,038 | 962 |
Quality | 971 | 905 |
Non-partner Digital Deposit | 10 | 6 |
Collections | 57 | 45 |
| Cataloging | 807 | 238 |
| Access and Use | 969 | 898 |
Copyright | 811 | 500 |
Permissions | 158 | 151 |
Takedown | 11 | 11 |
Print on Demand | 8 | 12 |
Inter-library loan | 24 | 12 |
Full-PDF or e-copy requests | 198 | 175 |
Datasets | 38 | 25 |
Data Availability and APIs | 9 | 14 |
Reuse of content | 25 | 27 |
| Web applications | 220 | 229 |
Functionality problems | 61 | 66 |
Problems with login specifically | 9 | 19 |
General Questions about Login | 21 | 21 |
Partners setting up login | 21 | 23 |
Usability issues | 20 | 30 |
Feature requests | 24 | 37 |
| Partner Ingest | 40 | 25 |
| General | 832 | 316 |
Partnership | 126 | 83 |
Infrastructure | 4 | 4 |
Miscellaneous | 702 | 229 |
| Total | 3,830 | 2,668 |
Links:
[1] http://www.hathitrust.org/updates_rss
[2] http://www.hathitrust.org/documents/hathitrust-updates-review2012.pdf
[3] http://www.hathitrust.org/home
[4] http://www.hathitrust.org/updates
[5] http://www.hathitrust.org/authors_guild_lawsuit_information
[6] http://www.hathitrust.org/ingest_tools
[7] https://groups.google.com/forum/?fromgroups#!forum/hathitrust-ingest
[8] http://www.hathitrust.org/board_of_governors
[9] http://www.hathitrust.org/documents/hathitrust-collections-duplicates-report-201204.pdf
[10] http://www.hathitrust.org/blogs/perspectives-from-hathitrust/ten-million-and-counting
[11] http://www.hathitrust.org/hathitrust-announces-new-board-of-governors
[12] http://www.hathitrust.org/authors_guild_lawsuit_ruling
[13] http://www.hathitrust.org/resources
[14] http://pinterest.com/hathitrust/
[15] http://www.hathitrust.org/blogs/perspectives-from-hathitrust/whats-in-your-collection
[16] http://www.w3.org/TR/WCAG/
[17] http://www.hathitrust.org/bib_metadata_correction
[18] http://www.hathitrust.org/usgovdocs_registry
[19] http://www.hathitrust.org/out-of-print-brittle
[20] http://www.hathitrust.org/hathifiles
[21] http://www.hathitrust.org/bib_specifications
[22] http://wiki.htrc.illinois.edu/display/OUT/HTRC UnCamp2012
[23] http://www.hathitrust.org/htrc
[24] http://hathitrust-quality.projects.si.umich.edu/index.htm
[25] http://hathitrust-quality.projects.si.umich.edu/inter-rater-reliability.htm
[26] http://hathitrust-quality.projects.si.umich.edu/distribution-of-error.htm
[27] http://hathitrust-quality.projects.si.umich.edu/co-occurence-of-error.htm
[28] http://hathitrust-quality.projects.si.umich.edu/physical-characteristics-of-the-original-source.htm
[29] http://www.lib.umich.edu/mpach/design-principles-and-requirements
[30] http://www.lib.umich.edu/files/departments/mPach_Overview_Diagram_v4.png
[31] http://www.hathitrust.org/mpach
[32] http://www.lib.umich.edu/mpach/modules
[33] http://www.hathitrust.org/automatic_login
[34] http://www.hathitrust.org/data_api
[35] http://www.hathitrust.org/deletion
[36] http://www.hathitrust.org/help_digital_library%2523SearchTips
[37] http://www.hathitrust.org/embed
[38] http://www.hathitrust.org/shibboleth
[39] http://bit.ly/14eiqOJ
[40] http://www.hathitrust.org/digital_object_specifications
[41] http://www.hathitrust.org/papers
[42] http://www.lib.umich.edu/imls-national-leadership-grant-crms
[43] http://www.lib.umich.edu/imls-national-leadership-grant-crms-world
[44] http://hdl.handle.net/2027/mdp.39015065192919
[45] http://hdl.handle.net/2027/pur1.32754077064610
[46] http://hdl.handle.net/2027/mdp.39015054061430
[47] http://hdl.handle.net/2027/coo.31924086713504
[48] http://hdl.handle.net/2027/mdp.39015065192935
[49] http://hdl.handle.net/2027/mdp.39015003228171
[50] http://hdl.handle.net/2027/mdp.39015065192927
[51] http://hdl.handle.net/2027/mdp.39015000804453
[52] http://hdl.handle.net/2027/mdp.39015016901020
[53] http://hdl.handle.net/2027/mdp.39015048771052
[54] http://hdl.handle.net/2027/mdp.39015009011811
[55] http://hdl.handle.net/2027/mdp.39015008691761