Navigation

Update on July 2013 Activities

August 9, 2013 Syndicate content

[Download PDF]

Top News


HathiTrust Research Center

The HathiTrust Research Center (HTRC) is pleased to announce that the Andrew W. Mellon foundation has awarded $437,000 to the University of Illinois at Urbana-Champaign in partnership with Indiana University for an exciting new project entitled “Workset Creation for Scholarly Analysis: Prototyping Project” (WCSA). The two-year project will focus on enriching and augmenting metadata for the HathiTrust corpus to support selection and discovery of the resources that scholars need to gather together for computational analysis and scholarly investigation. As part of the project, HTRC will release an open, competitive Request for Proposals in November 2013 with the intent to fund four prototyping projects that will build tools for enriching and augmenting metadata for the HathiTrust corpus.

You can learn more about our new project at the second annual HTRC UnCamp, September 8-9, 2013 at the University of Illinois. Registration is now open, and details are available at http://www.hathitrust.org/htrc_uncamp2013.

Program Steering Committee

The newly constituted Program Steering Committee has held its first conference call and is planning for a full day meeting in September. Early priorities will be to organize action around two proposals approved at the Constitutional Convention -- to establish a distributed print monograph archiving program and to expand and enhance access to U.S. federal publications -- plus an expansion of the current policies concerning metadata. The Committee is also considering the future role of existing HathiTrust committees as well as what new working groups may be needed to carry out its work.

User Support Working Group

HathiTrust is pleased to welcome six new members to the User Support Working Group (USWG). The USWG is a multi-institutional group that is responsible for receiving, responding to, and routing appropriately all user inquiries submitted to HathiTrust. The group works with staff at HathiTrust partner institutions to address a wide range of issues, from those related to copyright and quality, to issues with login and requests to accession new volumes.  New members include: Leila Smith (Harvard University), Geoffrey D. Swindells (Northwestern University), Josh Hadro (New York Public Library), Rachel S. Fox Von Swearingen (Syracuse University), Leigh Billings (University of Michigan), and Dale Larsen (University of Utah). The full membership and charge of the group are available at http://www.hathitrust.org/wg_user-support_charge. A summary of User Support inquiries received in July is included at the end of the update.

Ingest


General

HathiTrust continued to correspond with Texas A&M University, The University of Maryland, Indiana University, and the University of Florida regarding ingest of locally-digitized materials. HathiTrust discussed future deposits of Internet Archive-digitized materials with the Library of Congress, University of Connecticut, and University of Maryland.

Projects


Bibliographic Data Management

The California Digital Library (CDL) team began to load all current HathiTrust bibliographic records into a production instance of the new metadata management system, Zephir. Once the records are loaded, staff at CDL and the University of Michigan will bring Zephir and the current bibliographic management system at Michigan into parity and enter a parallel phase, running both systems in tandem to ensure that Zephir is well-positioned to go into production as the HathiTrust metadata management system.

Special Note: Beginning August 15, we ask that all institutions contributing records to HathiTrust send the records to both the University of Michigan and to the University of California. See http://www.hathitrust.org/ingest_checklist for details. Please contact feedback@issues.hathitrust.org with any questions.

Copyright Review

A summary of the determinations from HathiTrust copyright review activities in July is given below. See CRMS-US and CRMS-World for further information.

 

July

Overall

Public Domain Determinations

All Determinations

Public Domain Determinations

All Determinations

CRMS-US

3,625

8,056 142,614 268,636

CRMS-World

2,208 3,177 32,068 59,644

Total

5,833 11,173 174,682 328,280

mPach

Staff at the University of Michigan refined plans for storing data that will enable linking between records for individual articles in the HathiTrust catalog, the full-text view of articles in the HathiTrust PageTurner, a journal-level record for articles in the HathiTrust catalog, and information about the journal in the HathiTrust Collection Builder application. Staff also made improvements to the structure of the METS metadata files that will accompany mPach articles, and to capabilities to render full-text articles in HTML, PDF, and EPUB.

Development Updates


HathiTrust institutions performed the following work related to applications and Web interfaces:

Data API

  • Staff implemented support for JATS articles in version 2 of the Data API, in conjunction with the mPach project. Staff also enhanced the Data API user interface to support viewing as well as downloading options and prepared the interface to support JATS articles.
  • Staff developed naming conventions for the METS profile URIs to be used for book, audio, JATS, and TEI materials in HathiTrust. The conventions will support differential handling of materials in the repository based on format, and facilitate the addition of new formats in the future. Small numbers of audio files have been added to the repository over the last year as part of a pilot project. The mPach project will soon be submitting JATS XML. The timeline for supporting TEI is to be determined.

Full-text Search

After delays in shipping due to a manufacturing backlog, the new flash-based, high-performance storage to be used with full-text search arrived in Michigan, and HathiTrust staff began initial configuration and testing. After consulting with the manufacturer on requirements, a Request For Quotation for high-performance networking to connect the storage to search indexing servers has been drafted and will be issued in August.

Staff performed preliminary performance tests of the Solr index’s grouping functionality as part of work to improve relevancy ranking of full-text search results. Staff also evaluated the suitability of new relevancy ranking algorithms that are available in Solr 4.

Staff reorganized the structure of the file system supporting full-text search indexing in order to optimize management of indexing operations. Staff adjusted indexing systems relying on the file system accordingly.

Storage Hardware Replacement Cycle

Staff removed all equipment due for retirement from service, performed appropriate security wipes, and now await return shipment to fulfill trade-in requirements.

Outages

No outages were reported in July.

New Growth

As of August 1:

  July Overall
Boston College 0 2,361
Columbia University 0 65,033
Cornell University 3,578 427,014
Duke University 0 4,523
Harvard University 0 236,069
Indiana University 39 195,336
Library of Congress 0 89,724
North Carolina State University 0 3,196
Northwestern University 137 35,481
New York Public Library 2 288,356
Penn State University 4,427 64,064
Princeton University 1 251,705
Purdue University 0 44,692
Universidad Complutense 0 111,983
University of California 3,951 3,395,242
The University of Chicago 2,262 33,074
University of Florida 0 2,068
University of Illinois 4 111,129
University of Michigan 3,380 4,650,513
University of Minnesota 549 107,892
University of North Carolina, Chapel Hill 0 16,588
University of Wisconsin 66 555,810
University of Virginia 2 50,817
Utah State 0 117
Yale University 0 23,678
Total 18,398 10,766,465

Public Domain (~31%)

Total* 24,776 3,430,208

* Includes volumes opened through copyright review and rights holder permissions

Summary of Issues Received by User Support

Issue Type July June
Content 322 342

Quality

313 329

Collections

8 13
Cataloging 140 81
Access and Use 190 202

Copyright

125 148

Permissions

8 10

Takedown

2 0

Print on Demand

0 1

Inter-library loan

2 4

Full-PDF or e-copy requests

16 12

Datasets

1 4

Data Availability and APIs

0 0

Reuse of content

5 1
Web applications 27 20

Functionality problems

8 9

Problems with login specifically

3 2

General Questions about Login

2 0

Partners setting up login

2 1

Usability issues

2 1

Feature requests

2 1
Partner Ingest 5 3
General 39 34

Partnership

7 8

Infrastructure

0 0

Miscellaneous

32 26
Total 723 670

Most Accessed Volumes

Title
Quicksand, by Nella Larsen.
Roster of the Confederate soldiers of Georgia, 1861-1865, v.1.
Kinematics and Dynamics of Plane Mechanisms, by Jeremy Hirschhorn.
Plane and Spherical Trigonometry with Applications, by William L. Hart
Department of Defense Appropriations for 1970, v.6 (pt.6).
The Book of a Hundred Hands, George Brant Bridgman.
One Damned Island After Another, by Clive Howard.
The Human Figure, by John H. Vanderpoel.
Town Planning in Practice: an Introduction to the Art of Designing Cities and Suburbs, by Raymond Unwin.
A Treatise on Money, v.1 1930, by John Maynard Keynes.

August Forecast

  • Continue work to support full-text indexing of JATS articles.
  • Complete processes to produce ePub and PDF from JATS.
  • Continue to explore improvements to full-text search relevancy ranking.

Presentations

You can follow HathiTrust on Twitter or Facebook, or subscribe to receive email updates (via Google Groups).