Navigation

Update on May/June 2015 Activities

June 24, 2015 Syndicate content

[Download PDF]

Top News


Thank You and Goodbye to Jeremy York

Many of HathiTrust’s members and partners have gotten to know our assistant director, Jeremy York, very very well.   He has been involved in so many areas of HathiTrust’s development over the last seven years that it’s not possible to fully document his contributions. Jeremy has recently decided that the time has come for him to leave HathiTrust and pursue new  activities.  Ultimately he intends to return to graduate school and pursue a PhD in information studies. In the meantime, he’s accepted a new position to manage a funded research grant on digital preservation and publicly funded data.  His last day with HathiTrust will be June 23.  You can read more about Jeremy’s work and plans in this post

Shared Print Report Released for Comment

The HathiTrust Print Monograph Archive Planning Task Force Final Report is now available for public review and comment.  The Task Force recommends a series of actions to rapidly develop the program, which include 1) initiating discussions with archives and libraries to secure retention commitments for approximately 50% of the unique titles in HathiTrust, developing infrastructure through partnerships, seeking community comment, and establishing a Shared Print Operating Committee to continue planning, among others.

If you are planning to attend ALA, this program will be discussed during the Print Archive Network (PAN) meeting, sponsored by the Center for Research Libraries, on Friday June 25 at 9am (Check location and times at http://alaac15.ala.org/node/30197).The Board of Governors thanks the members of the Task Force  for the tremendous effort they put into developing a thoughtful, coherent, and rich set of recommendations.  The membership included:  Tom Teper, Chair (University of Illinois); Clem Guthro (Colby College); Robert Kieft (Occidental College); Erik Mitchell (University of California, Berkeley); Jake Nadal (ReCAP); Jo Anne Newyear Ramirez (University of British Columbia); Matthew Revitt, Recorder (University of Maine); Matthew Sheehey (Brandeis, formerly Harvard University); Emily Stambaugh (California Digital Library); and Karla Strieb (Ohio State).

We plan  to have other opportunities to discuss this program publicly and to gather feedback.   At this time we welcome any comments or questions, specific or general, and you may send these to print-archive-comments@hathitrust.org.

Program Steering Committee Nominations

The Board of Governors welcomes nominations to fill a two-year term on the Program Steering Committee, commencing in August 2015. Nominations may be submitted by Member Representatives, but self-nominations are also welcome. Nominees should be at the AUL or senior management level to ensure an appropriate level of experience in the issues at hand.​ 

Nominations should include the name, title, and institution of the nominee, and a short description of their qualifications for the appointment. Please send nominations to Melissa Stewart (mmstewa@hathitrust.org) with the subject line “HT PSC Nomination.” by Friday July 17, 2015. Sarah Michalak, Past Chair of the Board of Governors and Chair of the Nominating committee is coordinating the nominations and appointment process. 

The Program Steering Committee “Reviews HathiTrust’s development agenda, shaping initiatives and strategies for Board discussion and decision-making, and considering the implications of those initiatives for the future.” TheCommittee meets virtually roughly biweekly, and may hold one to two in-person meetings per year. Much of the Committee’s work is carried out through working groups or task forces formed to address specific issues and initiatives. For more information, see http:// www.hathitrust.org/psc.

Zephir Advisory Group Appointed

The Zephir Advisory Group has been formally appointed and begun their work.  Membership includes: 

  • Patti Martin, California Digital Library (Chair)
  • Gary Charonneau, Indiana University
  • Todd Grappone, UCLA
  • Chew Chiat Naun, Cornell University
  • John Mark Ockerbloom, University of Pennsylvania
  • Jonathan Rothman, University of Michigan
  • Ryan Rotter, University of Michigan
  • Katheryn Stine, California Digital Library

Their charge can be found at http://www.hathitrust.org/wg_zag_charge.

Getting Locally Digitized Content into HathiTrust

How can your library add locally digitized materials to HathiTrust?  Aaron Elkiss of the University of Michigan has written a short post with some background and details of current practices.

Register for Webcasts for New Members

HathiTrust will host two webcasts later this summer to provide an overview for members who have recently joined. All members are welcome to attend these sessions, which will be held on July 30 at 4:00pm EDST and August 5 at 11:00am EDST. 

We ask that all attendees register, and urge you to organize group viewing sessions at your library.  You may register here: http://goo.gl/forms/BZx1sWbRSW

Access information will be provided to registrants before these events. 

Ingest


Locally-digitized Content

Staff working on HathiTrust processes ingested locally-digitized content from University of Illinois, Urbana-Champaign, University of Missouri, and University of Delaware. They also communicated with the Frick Collection, Universidad Complutense de Madrid, Princeton University, Northwestern University, Univerity of North Carolina, University of Florida, University of Alabama, Boston College, and University of Maryland.

Bibliographic Data Management

The California Digital Library (CDL) loaded 66,463 new, and 366,197 update records.

Projects


Copyright Review

A summary of the determinations from HathiTrust copyright review activities in May is given below. See CRMS-US and CRMS-World for further information. The CRMS projects are funded by the Institute for Museum and Library Services.

 

May

Overall

Public Domain Determinations

All Determinations

Public Domain Determinations

All Determinations

CRMS-US

9661,636171,997

324,735

CRMS-World

3,8037,595108,731204,764

Total

4,7699,231280,728529,499

Government Documents Registry

As of June 2, there are 631,197 US federal documents in HathiTrust.

Staff have been prepared the alpha release of the US Federal Government Documents Registry. Initial access to the data will be via a graphical interface, though there are plans to develop an API in the future. More than 15 million records have gone through the relationship detection process, yielding 3,661,389 clusters and 845,342 distinct items. 

Work has also progressed on the manual review of records that aren’t obviously duplicates, but aren’t necessarily distinct. A rudimentary system is currently in place, allowing for some reviews to be made. Once enough decisions have been made, we’ll use that information to refine the relationship detection process.

HathiTrust Research Center Updates

HTRC’s Secure HathiTrust Analytic Commons (SHARC) can now be accessed by this URL: https://sharc.hathitrust.org.

Users should be aware that the SHARC team has set a monthly maintenance window for the 1st Tuesday of each month, starting in May.  This gives the SHARC team a chance to apply patches, fixes and small changes, as well as perform cleanup, that requires the service to be down.

Eric Lease Morgan has put together the “HathiTrust Research Center Workset Browser,” which he describes as a “fledgling tool for doing distant text mining against the corpora from the HathiTrust.”  The Research Center’s monthly user group meeting on June 11 featured Eric discussing this proof-of-concept tool.  See more on Eric’s blog post http://blogs.nd.edu/emorgan/2015/05/htrc-workset-browser/

Eleanor Dickson joined the HTRC this month as Digital Humanities Specialist in the University of Illinois Library. Eleanor, a native of California, comes from Emory University where she was a Research Library Fellow in the Emory Center for Digital Scholarship and Emory’s Manuscript, Archives, and Rare Book Library. She gained her master’s degree in Information Studies from the University of Texas at Austin and a bachelor’s degree in History and English from the University of California, Santa Barbara. Passionate about instruction, open data, and digital scholarship, Eleanor looks forward to exploring these areas and more in her new position.

Development Updates


Development updates and activities by HathiTrust institutions included the following:

Full-text Search

  • The development Solr environment was migrated to  a new test server and the test scripts, debugging tools, and Solr configuration were tested and adjusted to the new environment. The new test server will allow more realistic performance testing of new Solr features and the re-testing of older features against the Solid State Drives.

Infrastructure

  • Core services worked with the vendor to resolve hardware issues preventing system softare upgrade and successfully performed the upgrade for the cluster at the University of Michigan.  The storage cluster at Indiana University-Purdue University Indianapolis will be upgraded by the end of June.

Page Turner

  • Social Toolbar: Added a “social toolbar” to PageTurner and Collection Builder listings for users to readily share links to HathiTrust content/collections. PageTurner provides metadata for Facebook and Twitter to enhance those links (e.g. https://twitter.com/EdenKristina/status/598832485666070528).

Papers and Presentations

“Text mining with the HathiTrust Research Center: An introduction to working with digitized text corpora and metadata.” Workshop; at the Annual Conference of the Humanities, Arts, Science, and Technology Alliance and Collaboratory (HASTAC), Michigan State University, East Lansing, MI. 30 May 2015.

Upcoming Presentations

Summer Forecast

  • Put Solr plug-in to reduce memory use into production and complete the process of full-text re-indexing

  • Continue work on a test framework for relevance ranking, including interleaving of search results for the comparison of ranking algorithms.

  • Remove uses of and dependencies on CoSign for HathiTrust member authentication into HathiTrust (making authentication reliant only on Shibboleth).

New Growth


Up-to-date Ingest numbers can be found here: http://www.hathitrust.org/statistics_deposited_volumes_monthly

  
Summary of Issues Received by User Support

Issue TypeMay 2015April 2015
Content154156

Quality

139140

Collections

1516
Cataloging161158
Access and Use140133

Copyright

6573

Permissions

1214

Takedown

02

Print on Demand

00

Inter-library loan

04

Full-PDF or e-copy requests

2129

Datasets

40

Data Availability and APIs

14

Reuse of content

14
Web applications2947

Functionality problems

1015

Problems with login specifically

01

General Questions about Login

03

Partners setting up login

00

Usability issues

00

Feature requests

22
Partner Ingest1315
General97115

Partnership

617

Miscellaneous

9198
Total594607

*See User Support Working Group Issue Types for a description of the types of issues included in each category.

Most Accessed Volumes


Title
Crosses in the Wind, by Joseph James Shomon.
War in New Guinea, Official War Photographs of the Battle for Australia.
The History of Royal College (formerly called the Colombo Academy), Written by Boys in the School, 1931.
Solid mensuration, by Willis F. Kern and James R. Bland.
Roster of the Confederate soldiers of Georgia, 1861-1865, v.2.
Quicksand, by Nella Larsen.
The Metco Metallizing Handbook, by H. S. Ingham & A. P. Shepard.
Roster of the Confederate soldiers of Georgia, 1861-1865, v.1.
Comparative Anatomy of the Vertebrates, by George C. Kent.
Roster of the Confederate soldiers of Georgia, 1861-1865, v.3.
The Human Figure, by John H. Vanderpoel

Availability


Repository

Cumulative 12-month availability of repository access: 99.975% (+0.000%).

HathiTrust was briefly unavailable on Wednesday, May 6, from 11:28 - 11:34 ET after rebooting the wrong server due to an error in the inventory. The error in the inventory has been corrected.

A bug was introduced preventing the downloading of full book PDFs on Wednesday, May 27, from 09:36 - 14:33 ET. PDFs would appear to be built (in PageTurner), but the actual download would fail with a 500 exception. The bug was fixed.

* Repository access refers to page viewing and full-text search functionality, i.e., user-facing applications. It does not refer to preservation or storage infrastructure, which is under continual operation.