Available Indexes

Update on April 2015 Activities

May 14, 2015 Syndicate content

[Download PDF]

Top News

Research Center Releases Important Dataset

The HathiTrust Research Center has released the Extracted Features Dataset, derived from 4.8 million public domain volumes in the HathiTrust collection.  The release will support analysis of large worksets of volumes in the HathiTrust public domain collection, at scales previously intractable for most individual researchers. For example, page-level token (word) counts, can be used to help build topic models, classifications and perform other text analytics. http://www.hathitrust.org/htrc-releases-massive-dataset

Spring Board of Governors Meeting

The Board of Governors held its spring 2015 meeting on May 1 in Berkeley, CA.   In addition to regular updates, the Board discussed and took action on the following matters: 

Print Monograph Archive Planning:  The Board reviewed the recommendations of the Print Monograph Archive Planning Task Force and the cover report and recommendations of the Program Steering Committee.  The Board commended the Task Force on their excellent report and discussed how the initiative would be implemented.  A summary of the reports recommendations is being prepared for comment by the broader library community, and is expected to be available in advance of ALA. 

Government Documents:  To clarify the scope of the initiative, the Board has officially retitled it the US Federal Documents Initiative.  The Board gave final budget approval to hire a program officer to oversee this initiative.  

Staffing:  The Board gave final budget approval for a staff position in the Executive Director’s office to support user and member services, documentation, and project management. 

Budget and Strategic Planning:  In preparation for the 2016 budget process, the Board discussed principles for long-range financial planning and management.

Membership strategy:  The Board discussed several membership inquiries and the current criteria for membership in HathiTrust.  Mike Furlough was tasked to consult with a small group of directors to be named on membership strategies, and with drafting updated criteria for consideration by the Board and eventually the membership.   

New Blog Post Highlights Efforts to Improve Quality

Jeremy York and Kat Hagedorn have written a blog post explaining how HathiTrust addresses reported problems with digitized volumes in HathiTrust.  

Nominating Committee Named

Appointees to the 2015 HathiTrust Nominating Committee have been named: 

  • Alberta Comer, Dean of the J. Willard Marriott Library, and University Librarian, University of Utah
  • Robert Gerrity, University Librarian, University of Queensland
  • Lorraine Haricombe, Vice Provost and Director of Libraries, University of Texas Austin
  • Karen Williams, Dean of University Libraries, University of Arizona

The 2015 Nominating Committee will be chaired by past chair of the Board Sarah Michalak, Associate Provost and University Librarian, University of North Carolina Chapel Hill.  

The HathiTrust Nominating Committee has responsibility for soliciting nominees for the Board of Governors and candidates for the Program Steering Committee. In fall 2015 HathiTrust will hold its first regular election for new Board members since initiating the current governance model in 2012.  

Program Steering Committee

In April, the Program Steering Committee focused primarily on review and analysis of the report and recommendations of the HathiTrust Monograph Archive Planning Task Force.  PSC forwarded its own cover report and recommendations to the Board of Governors for consideration at their May 1 meeting. PSC has continued work on several major issues identified last fall:  1) developing a framework to plan for new collection formats 2) improving quality validation and assessment 3) improving metadata quality and policy development 4) creating a framework for development proposals.  

User Support Working Group Nominations

The User Support Working Group is seeking nominations for up to 2 new members. We are seeking staff who have expertise in providing general user support and those who have expertise in cataloging in particular. The nomination period has been extended to May 22, 2015. To submit nominations and for further information about the working group, please visit http://tinyurl.com/m9qlyyg.

HathiTrust Member Update Webcasts

HathiTrust will host two webcasts this summer to provide an update on current activities and member services.  All staff from member libraries, especially libraries that have recently joined HathiTrust, are encouraged to attend.   Registration details and dates will be announced by early June.

HathiTrust Participates in Planning Grant for Services to Students with Disabilities

Mike Furlough, Executive Director, and J. Stephen Downie, Co-director of the HathiTrust Research Center, will serve on the steering committee of an IMLS planning grant awarded to Tufts University.  Titled “Repository Services for Accessible Course Content,” the project will be led by  Larua Wood, Director of Tisch Library, Tufts University, and John Unsworth, University Librarian and CIO, Brandeis University.  Over the course of one year, this planning project will bring together experts from disability/accessibility services with librarians, IT professionals, advocates, and legal counsel, to develop shared infrastructure within which universities can support their students with disabilities.  HathiTrust currently provides to students who have a print disability and who are enrolled at a member institution with access to in-copyright works in the collection. 


Internet Archive

Staff worked with Duke University, Washington University, and Columbia University. Tufts University successfully submitted their first batch of content for ingest.

Locally-digitized Content

Staff worked with Boston College, Virginia Tech, Northwestern University, Cornell University, Texas A&M, and Princeton University to resolve questions. Content was ingested from University of Illinois, Urbana Champaign. University of Washington successfully ingested their first batch of content.

Bibliographic Data Management

The California Digital Library (CDL) loaded 78,778 new, and 43,357 update records.


Copyright Review

A summary of the determinations from HathiTrust copyright review activities in March is given below. See CRMS-US and CRMS-World for further information. The CRMS projects are funded by the Institute for Museum and Library Services.




Public Domain Determinations

All Determinations

Public Domain Determinations

All Determinations


792 1,154 171,047



4,188 7,594 105,105 197,495


4,980 8,748 276,152 520,624

Government Documents Registry

As of April 30th there are 621,188 government documents in HathiTrust. 

Over 7.8 million records are currently included in the Registry and testing continues on algorithms to identify related items. An alpha version of the US Federal Documents Registry will be available in June, and staff will be seeking feedback on initial functionality and refining potential use cases for the Registry.

Two University of Washington iSchool students have been working with project staff since January on approaches to the manual review of record pairs identified by the relationship detection process as being potentially related. In April they began reviewing record pairs and making decisions as to whether or not the records were for duplicate items.   

A report on activities from the past six months is now available.

HathiTrust Research Center Updates

The Research Center has released the Extracted Features Dataset (v.0.2). http://www.hathitrust.org/htrc-releases-massive-dataset

The Research Center held its monthly user group meeting on April 30, 2015. Sayan Bhattacharyya presented the HTRC + Bookworm project and demoed the online interactive system to the participants, also fielding questions and feedback from the HTRC user base.

HTRC UnCamp 2015 was featured and overviewed in a blog of DLF (Digital Library Federation), written by three staff at University of Michigan Library. http://www.diglib.org/archives/8289/ 

J. Stephen Downie traveled to Brown University, Bryn Mawr College, Haverford College and Swarthmore College to present generally on SHARC services and their role in instruction in the classroom.

Development Updates

Development updates and activities by HathiTrust institutions included the following:

Full-text Search

  • Re-indexing using additional new hardware was begun in April and is expected to be available in production in May.  When completed this will result in improved search performance.
  • Staff prototyped a method of relevance ranking using “balanced interleaving,” which allows for comparison of different weights in metadata fields and OCR.  


  • Upgraded load balancers to better support current HTTPS best practices
  • Changed ingest reports and ingest logs to be listed by content provider and digitization source instead of by namespace

Page Turner

  • Content provider information is now sent to Google Analytics during item access.

Papers and Presentations

May Forecast

  • Put Solr plug-in to reduce memory use into production and complete the process of full-text reindexing.

  • Continue work on a test framework for relevance ranking, including interleaving of search results for the comparison of ranking algorithms.

  • Add social sharing options to PageTurner and Collection Builder

New Growth

As of May 1, Ingest numbers can be found here: http://www.hathitrust.org/statistics_deposited_volumes_monthly

Public Domain (~38%)
Total*                                                                51,346 5,056,297

* Includes volumes opened through copyright review and rights holder permissions

Summary of Issues Received by User Support

Issue Type April 2015 March 2015
Content 156 148


140 144


16 13
Cataloging 158 154
Access and Use 133 130


73 65


14 9


2 0

Print on Demand

0 0

Inter-library loan

4 2

Full-PDF or e-copy requests

29 29


0 2

Data Availability and APIs

4 0

Reuse of content

4 3
Web applications 47 47

Functionality problems

15 27

Problems with login specifically

1 1

General Questions about Login

3 1

Partners setting up login

0 1

Usability issues

0 0

Feature requests

2 0
Partner Ingest 15 10
General 115 138


17 6


98 132
Total 607 637

*See User Support Working Group Issue Types for a description of the types of issues included in each category.

Most Accessed Volumes

Quicksand, by Nella Larsen.
Modern California Houses: Case Study Houses, 1945-1962, by Esther McCoy
Design of Equilibrium Stage Processes, By Buford D. Smith
Godey's Magazine, v. 40-41, 1850.
Solid Mensuration, by Willis F. Kern and James R. Bland.
The Lesson of Japanese Architecture, by Jiro Harada.
War in New Guinea, Official War Photographs of the Battle for Australia.
The Human Figure, by John H. Vanderpoel
Quintus Curitus (History of Alexander), Vol. 1, with an English translation by John C. Rolfe.
Roster of the Confederate soldiers of Georgia, 1861-1865, v. 3.
Roster of the Confederate soldiers of Georgia, 1861-1865, v. 1



Cumulative 12-month availability of repository access*: 99.975% (+0.004%). 

* Repository access refers to page viewing and full-text search functionality, i.e., user-facing applications. It does not refer to preservation or storage infrastructure, which is under continual operation.