HathiTrust Training and Information Sessions
In the Update on July Activities we distributed a short survey to receive feedback on our next series of HathiTrust information and training sessions. We have received many responses. The deadline for completing the survey is September 21. If you have not already, please take a moment to provide input on the kinds of sessions you would like to attend or lead, and the form you would prefer these sessions to take (e.g., a webinar series, in-person meeting, or a combination of the two). The survey is available at http://tinyurl.com/8n3k9nr.
Data API Changes In Effect October 1
Beginning October 1, all requests to the Data API will need to be signed with an access key provided by HathiTrust. Access keys for programmatic uses of the Data API can be obtained at http://babel.hathitrust.org/cgi/kgs/request. HathiTrust has also created a Web client that employs a user’s login credentials as a proxy for an access key to facilitate non-programmatic uses. Complete documentation of the security enhancements, methods of obtaining keys, how to sign requests, and how to access the Web client is available at http://www.hathitrust.org/data_api.
Also effective October 1, the host “services.hathitrust.org” will no longer exist for the Data API. The new host will be “babel.hathitrust.org”, the same host as the PageTurner and other HathiTrust services. Calls to the Data API will therefore need to use URLs such as the following (note the additional “cgi” in the path):
Shibboleth Library Walk-in
Later this year, HathiTrust will begin accepting the “library-walk-in” Shibboleth attribute from partner institutions to provide certain member privileges to guest users who do not have an institutional login. For instance, “Library-walk-in” users will have the ability to download full-PDFs of all public domain materials in HathiTrust. Partners who wish to use HathiTrust library-walk-in functionality must confirm in writing that they are asserting the library-walk-in affiliation only for users physically present in a library building at the time of session initiation. Please see Shibboleth Login for more information about Shibboleth in HathiTrust.
Internet Archive Digitization
HathiTrust ingested nearly all of a set of approximately 2,000 volumes from Boston College, and loaded bibliographic records for additional volumes that will be deposited by the University of Illinois. The University of Florida submitted sample bibliographic records to be analyzed in preparation for content ingest.
Working Groups and Committees
Working groups and committees in HathiTrust may have an operational or strategic focus. See http://www.hathitrust.org/working_groups for more information.
Communications Working Group
The Communications Working Group did not meet in August, taking its first break since the group’s formation in May 2010. As the group awaits the solidification of the new HathiTrust governance, group members plan to address the results of their survey on training, and look ahead to fall activities and meetings.
User Experience Advisory Group
The User Experience Advisory Group continued discussions about a new home page design and provided feedback on mockups created by the University of Michigan.
User Support Working Group
A summary of the issues received by the User Support Working Group is provided at the end of the update.
Bibliographic Data Management
California Digital Library (CDL) and University of Michigan staff agreed on a data workflow for updating rights information in the HathiTrust rights database when CDL takes responsibility for managing HathiTrust bibliographic data. The CDL team is refining and improving the performance of bibliographic data exports needed to support HathiTrust operations. Analysis continued to address issues with a small percentage of poor quality records.
Michigan staff successfully tested the bibliographic record submission process for Zephir (the new management system) and commented on corresponding submission guidelines. In the coming month CDL will be contacting institutions that are currently, or were in the past, contributors of content to HathiTrust to test the new process for submitting records. The test will be aimed primarily at current content contributors, but all contributors will be invited. Please contact email@example.com if your institution is not contributing content currently but you would like to test.
HathiTrust Research Center
The HTRC made preparations for its first “UnCamp”, held in Bloomington, Indiana on September 10-11. A full report on the gathering will be forthcoming.
IMLS Quality Grant
Project staff continued work to finalize the quality review datasets. This included reviewing datasets for completeness, accuracy, and missing data, and performing reliability and validation testing on data for volumes that were double-coded for quality assurance purposes.
The IMLS grant advisory board met for its second time in mid-August. The project team presented its findings to-date and advisory board members provided input on work to be completed in the final stages of the project, as well as on research directions in the future. Over the next several months the project team will focus on completing the design of user studies to further investigate quality in relation to the usefulness of digitized volumes, collecting data to support the user studies, and conducting the user studies themselves.
Efforts continue to develop a framework for certifying the quality of volumes in HathiTrust. This includes the development of a modified data collection Web interface based on the interfaces used in the grant thus far.
For more information on the project, please visit the project website.
The mPach team at the University of Michigan updated the project timeline on the HathiTrust project page. Work continued on modifications to the HathiTrust PageTurner to display JATS XML, and on refinements to the METS specification for mPach Submission Information Packages. Michigan staff made progress on enhancements to the Norm tool (part of content preparation), specifically enhancements to normalize bulleted lists, figures with captions, and tables. Wireframes are nearly complete for the Dashboard module (see the list of mPach modules for more information on mPach modules). Michigan staff will be presenting on mPach at the 2012 DLF Forum.
Staff at the University of Michigan continued work to improve general accessibility for HathiTrust Web applications.
Michigan staff extended functionality of the Data API to serve full PDFs of volumes for print-on-demand services on Espresso Book Machines (EBM) via the ExpressNet sales network. Staff also augmented Data API usage monitoring to explicitly track signed requests, and made enhancements that will enable the Data API to deliver dynamically-generated image derivatives (such as PNG images as opposed to TIFF or JP2 images).
First Full Repository Upgrade
Development and testing for the metadata upgrade reported in the Update on July Actvities has been completed, and the upgrade will begin in October.
Michigan staff continued to investigate the Solr edismax parser bug that is preventing CJK searching from working properly. Staff confirmed that the bug also affects Solr 4.0 and submitted sample documents and queries demonstrating the problem to the Solr JIRA issue tracking system: see https://issues.apache.org/jira/browse/SOLR-3589. Staff investigated possible workarounds for this issue, and conducted e-mail discussions with several Blacklight developers who are working on CJK issues.
Staff also made changes to the automated full-text search indexing process so that failures caused by server errors are automatically re-queued.
The INEX (Initiative for the Evaluation of XML Retrieval) Book Track accepted a paper by Michigan developer Tom Burton-West on full-text search relevance ranking in HathiTrust. The paper will be published in the INEX 2012 Pre-proceedings as part of the CLEF Labs Working Notes.
Michigan staff made changes that will make it easier to support new formats in the PageTurner interface. The mPach project will make use of the changes to add support for JATS XML.
HathiTrust was unavailable on Monday, August 13 from 7:30-8am EDT for a security-related database reorganization.
HathiTrust sends notice upon discovery and resolution of unscheduled outages and in advance of scheduled outages and maintenance work that may result in an outage. We welcome and encourage additional recipients for these notices. If your institution is not receiving outage notifications and would like to, please contact firstname.lastname@example.org.
As of August 1:
|Library of Congress||1||89,722|
|North Carolina State University||0||3,196|
|University of North Carolina - Chapel Hill||0||8,088|
|New York Public Library||8||259,571|
|Penn State University||35||44,018|
|University of California||26,493||3,373,076|
|The University of Chicago||2,240||24,679|
|University of Illinois||823||101,001|
|University of Michigan||8,533||4,560,303|
|University of Minnesota||2,105||102,501|
|University of Wisconsin||3,559||542,795|
|University of Virginia||1,868||50,790|
Public Domain (~30%)
* Includes volumes opened through copyright review and rights holder permissions
Summary of Issues Received by User Support
Non-partner Digital Deposit
|Access and Use||119||112|
Print on Demand
Full-PDF or e-copy requests
Data Availability and APIs
Reuse of content
Problems with login specifically
General Questions about Login
Partners setting up login