Navigation

Update on September 2012 Activities

October 12, 2012 Syndicate content

[Download PDF]

Late Breaking News


Case Closed - HathiTrust is Fair Use

In a decision that will have broad repercussions across libraries, on October 10, 2012 Judge Baer dismissed the lawsuit filed just over a year ago by the Authors Guild et al. against HathiTrust and several participating libraries. HathiTrust has released an official statement on the ruling. Information about the lawsuit, as well as relevant analysis and reactions from around the Web are available on the HathiTrust website.

Top News


HathiTrust Research Center UnCamp

The HathiTrust Research Center held its first annual “UnCamp” in Bloomington, IN on September 10-11. 130 researchers, developers, and librarians from HathiTrust member and non-member institutions gathered in Indiana University’s new CyberInfrastructure Building for presentations, demos, and hands-on sessions with the emerging Research Center tools. These included tools both to perform research on the HathiTrust corpus and to create new or customized algorithms and processes for research. Responses to the UnCamp have been very enthusiastic, giving energy to efforts to enable computational access to the incredible body of works in HathiTrust. More information on the UnCamp, including presentations, resources, reactions and responses via tweets, and more can be found on the HathiTrust Research Center Wiki. See the press release also from the University of Illinois.

Government Documents Registry

HathiTrust has initiated a project to build a comprehensive registry of U.S. federal government documents. The Registry is an emerging effort in a broader undertaking by HathiTrust partners to improve access to U.S. federal government documents. Further information and background on the project is available on the Registry project page. A two-year term Government Documents Registry Analyst position for the project was posted in September.

Infrastructure Changes for Out of Print and Brittle

In the coming weeks, HathiTrust will begin making infrastructural changes to incorporate information about the holdings status and condition of volumes at partner institutions into access services. The changes will apply in particular to access on library premises to in-copyright works that fall under Section 108 provisions of the U.S. Copyright Act. One of the infrastructural changes will be altering the semantics of the “out-of-print and brittle” (“opb”) designation in HathiTrust’s rights database to “out-of-print” (“op”) only. This change will be made on November 1, 2012, and will be reflected in HathiTrust interfaces, and services such as the Hathifiles where rights information is made available.

Ingest


Internet Archive Digitization

HathiTrust coordinated with the University of Florida on upcoming deposit of volumes, and ingested a new batch of volumes from Penn State.

Local Digitization

HathiTrust ingested a new set of volumes from Utah State University Press and began conversations with the University of Delaware about processes and requirements for deposit of locally-digitized content. HathiTrust also corresponded with the University of Iowa about use of the new tools for validating and packaging locally-digitized materials for deposit. Institutions with questions about the new tools should contact feedback@issues.hathitrust.org.

Working Groups and Committees


Working groups and committees in HathiTrust may have an operational or strategic focus. See http://www.hathitrust.org/working_groups for more information.

Operational

Communications Working Group

The Communications Working Group continued to follow developments in HathiTrust governance, and to evaluate how the communications function in HathiTrust might be improved once the new governance structure is in place. The survey for HathiTrust training and information sessions has closed, and HathiTrust will use the results as a basis for upcoming informational events. If you did not have a chance to submit feedback and would like to, please email responses to the survey to feedback@issues.hathitrust.org.

User Experience Advisory Group

The User Experience Advisory Group continued discussions about a new home page design and provided feedback on mockups created by the University of Michigan.

User Support Working Group

A summary of issues received by the User Support Working Group is given in the table at the end of the update.

Projects


Bibliographic Data Management

California Digital Library (CDL) is in the final phase of development to bring Zephir into parity with the existing bibliographic management system at the University of Michigan. Once Zephir is in operation, institutions will submit bibliographic records for volumes they plan to deposit to Zephir, and Zephir will produce exports of bibliographic data that will be used in HathiTrust Web services. In October, as part of preparations for integration testing with HathiTrust systems, CDL staff will begin preliminary testing of the Zephir outputs to evaluate system performance and confirm the structure of outputs (that they have the correct metadata fields, etc.). CDL has been contacting institutions that are contributing records to HathiTrust on an ongoing-basis to test the process for submitting bibliographic records to Zephir. If your institution is not contributing content to HathiTrust currently but you would like to test the new submission process, please contact feedback@issues.hathitrust.org.

Copyright Review

A summary of copyright review activities in September is given below.

September Overall

Opened 

Reviewed

Opened 

Reviewed

CRMS-US

4,700

9,176 174,695 330,059

CRMS-World

3,656 7,191 10,248 22,266

Total

8,356 16,637 184,943 352,325

IMLS Quality Grant

The project team finalized a catalog of commonly-seen illustration errors in HathiTrust volumes for a sub-study on illustrative error. Donald Williams, a renowned research imaging scientist, analyzed the errors and met with members of the project team to explain the sources of the errors and possibilities for correction.

The project team continued work on the design of user studies to evaluate project findings, collection of data to support the user studies, and administration of the user studies themselves. Team members also discussed ways that quality review interfaces developed during the grant might be modified to support the certification of individual volumes. For more information on the project, please visit the project website.

mPach

Staff at the University of Michigan created a mockup of PageTurner changes that will be needed to navigate the XML-based  journal articles that will be submitted via mPach. Work also continued on modifications to PageTurner to display JATS XML and embedded media and on refinements to the METS specification for mPach Submission Information Packages. Staff completed wireframes and began coding the Dashboard module (see the list of mPach modules for more information). Michigan staff members will present on mPach at the 2012 DLF Forum.

Development Updates



Accessibility

HathiTrust has completed the first phase of improvements to enhance accessibility of HathiTrust Web applications. With a few minor exceptions that will be addressed in the second phase, HathiTrust interfaces are now compliant with Web Content Accessibility Guidelines (WCAG) 2.0, Level A. The second phase will target compliance with WCAG 2.0 Level AA and include usability testing by users who have print disabilities.

Data API

As of October 1, all requests to the Data API must be signed with an access key provided by HathiTrust. Details are available at http://www.hathitrust.org/data_api.

The Data API is being configured to deliver watermarked image derivatives in JPEG and PNG formats at a range of resolutions. The API currently delivers un-watermarked master images from the repository in TIFF and JP2.  Enhancements to the Data API Web client were made to support image derivatives when they become available through Data API, and development-level debugging.

Full-text Search

University of Michigan staff modified the full-text search indexing process to prevent volumes from being indexed on more than one shard (section) of the full-text Solr index. Staff also began testing full-text search using Solr 4.0 Beta. Solr 4.0 offers new ranking algorithms that may provide better relevance ranking for long documents (e.g., books). A paper by Michigan developer Tom Burton-West on full-text search relevance ranking in HathiTrust was published in the INEX 2012 pre-proceedings as part of the CLEF Labs Working Notes.

Following several months of informal research, Michigan staff began focused investigation into high-performance storage systems to improve full-text search response time and substantially increase search throughput capacity. An RFP for a new high-performance storage system will be issued in October.

Imgsrv

Imgsrv is the web application that serves derivatives of HathiTrust’s master images to Web applications such as the PageTurner. HathiTrust has enhanced Imgsrv to deliver HTML derivatives of born-digital content in support  of mPach and JATS XML.

PageTurner

HathiTrust implemented interface improvements designed by Michigan’s User Experience department for cases where special access to HathiTrust materials is available, such as access by users who have print disabilities. The improvements include dismissible notifications when special access is in effect, and updated explanatory text when special access that might be expected is not available (special access cases are described in HathiTrust’s Access and Use Policies). Special access is currently only available as a pilot at the University of Michigan. Extension of special access to other member institutions is still planned. More information will be forthcoming.

HathiTrust’s embeddable Pageturner is now based on the mobile Pageturner interface, which offers improved presentation and greater functionality.

HathiTrust has updated the version information displayed in the PageTurner to include the time a volume a was removed from HathiTrust. Volumes may be removed from HathiTrust at the request of the rights holder, or in cases where the volume is wholly unusable or a superior copy is available.

Outages

From 1:00pm on Tuesday, September 25 to 8:30am on Friday, September 28, some bibliographic data failed to display in HathiTrust due to an outage of the system at Michigan that manages bibliographic data for HathiTrust.

HathiTrust sends notice upon discovery and resolution of unscheduled outages and in advance of scheduled outages and maintenance work that may result in an outage. We welcome and encourage additional recipients for these notices. If your institution is not receiving outage notifications and would like to, please contact feedback@issues.hathitrust.org.

New Growth

As of September 1:

September Overall
Boston College 0 1,816
Columbia University 0 64,184
Cornell University 82 408,837
Duke University 0 4,523
Harvard University 0 235,983
Indiana University 0 187,683
Library of Congress 0 89,722
North Carolina State University 0 3,196
University of North Carolina - Chapel Hill 0 8,088
Northwestern University 7 7,221
New York Public Library 0 259,571
Penn State University 113 44,131
Princeton University 0 251,644
Purdue University 2,418 40,466
Universidad Complutense 0 111,899
University of California 796 3,373,872
The University of Chicago 238 24,917
University of Illinois 9 101,010
University of Michigan 22,241 4,582,544
University of Minnesota 115 102,616
University of Wisconsin 2,993 545,788
University of Virginia 0 50,790
Utah State 27 117
Yale University 0 23,678
Total 29,039 10,524,296

Public Domain (~30%)

Total* 24,016 3,211,760

* Includes volumes opened through copyright review and rights holder permissions

Summary of Issues Received by User Support

Issue Type September August
Content 248 286

Quality

242 279

Non-partner Digital Deposit

0 1

Collections

2 3
Cataloging 80 142
Access and Use 116 119

Copyright

71 62

Permissions

5 15

Takedown

2 0

Print on Demand

0 1

Inter-library loan

4 8

Full-PDF or e-copy requests

11 21

Datasets

3 7

Data Availability and APIs

0 1

Reuse of content

1 4
Web applications 12 22

Functionality problems

4 8

Problems with login specifically

0 1

General Questions about Login

0 1

Partners setting up login

0 4

Usability issues

0 0

Feature requests

0 2
Partner Ingest 3 4
General 55 74

Partnership

10 9

Infrastructure

0 0

Miscellaneous

45 65
Total 514 647

Papers and Presentations

Tom Burton-West, "Practical Relevance Ranking for 10 Million Books", INEX 2012 pre-proceedings, CLEF Labs Working Notes, September 2012.

HathiTrust UnCamp presentations and resources (via HathiTrust Research Center Wiki), September 10-11, 2012.

Heather Christenson and John Wilkin, "Intellectual Property Rights and the HathiTrust Collection" (forthcoming), UNESCO - The Memory of the World in the Digital Age: Digitization and Preservation, September 26, 2012.

Jeremy York, "A Preservation Infrastructure Built to Last: Preservation, Community, and HathiTrust", UNESCO - The Memory of the World in the Digital Age: Digitization and Preservation, September 26, 2012.

See http://www.hathitrust.org/papers for all papers, presentations, and reports.