Navigation

Update on March 2015 Activities

April 10, 2015 Syndicate content

[Download PDF]

Top News


2015 HathiTrust Research Center UnCamp

The 3rd annual HTRC UnCamp was held at the University of Michigan on March 30 and 31. More than 130 registrants attended the event, in addition to HTRC staff from Indiana University and the University of Illinois. The UnCamp included keynote addresses by Michelle Alexopoulos (University of Toronto) and Erez Liberman Aiden (Baylor University) and numerous other presentations, posters, and demonstrations. Presentation materials and other information are available at http://www.hathitrust.org/htrc_uncamp2015. UnCamp provides a unique, hands-on opportunity for HTRC users and collaborators to become familiar with the changing tools and services, as well as HTRC staff to get in touch with what is wanted from users and employ that information in the evolution of the HTRC. As in past years, this event was a great success in these areas, and the HTRC would like to thank all attendees, participants, and speakers!

5 Million Open Volumes

HathiTrust achieved a major milestone in March, surpassing 5 million “open” volumes, including materials that are both in the public domain and open access. Read more in a blog post by Executive Director Mike Furlough.

Duke University Press Opens Access to Backfile Publications

Duke University Press opened access to more than 140 backfile publications in HathiTrust. Read the full announcement.

Revised Bylaws Posted

A revised version of the Bylaws of HathiTrust, as amended by the members in February 2015, is now available.

US Federal Government Documents Initiative Reports Now Available

In October 2014 the Government Documents Initiative Planning and Advisory Working Group submitted a status report on HathiTrust’s Government Documents Initiative and recommendations for further action and investment in the program. The HathiTrust Program Steering Committee (PSC) endorsed the recommendations in principle and proposed to the Board of Governors that they be used as a guide for further action.  At its February 2015 meeting, the Board of Governors endorsed the recommendations by the Advisory Working Group and PSC, noting the impact that this initiative would have for member libraries and the public at large, as well as the potential for this initiative to reinforce the actions planned in other initiatives of the membership. The Board directed Mike Furlough and HathiTrust staff to develop a preliminary implementation and staffing plan to be discussed at the May 2015 Board of Governors meeting in San Francisco. While that plan is being developed, we are publishing both the Advisory and Working Group report and Program Steering Committee report to inform the membership and wider community of discussions and planning to date. Further details on the next steps for the Government Documents Initiative will be coming in the next few months.

User Support Working Group Nominations

The User Support Working Group is seeking nominations for up to 2 new members. We are seeking staff who have expertise in providing general user support and those who have expertise in cataloging in particular. Nominations are due May 15, 2015. To submit nominations and for further information about the working group, please visit http://tinyurl.com/m9qlyyg. 

Ingest


Google-digitized Content

HathiTrust began ingest of materials, comprised mainly of US federal government documents, from the University of Iowa. 

Locally-digitized Content

HathiTrust corresponded with Northwestern University, Princeton University, and Boston College about submission of new content, and ingested additional content from Emory University and the University of Illinois.

Bibliographic Data Management

The California Digital Library loaded 85,457 new or updated records into Zephir.

Projects


Copyright Review

A summary of the determinations from HathiTrust copyright review activities in March is given below. See CRMS-US and CRMS-World for further information. The CRMS projects are funded by the Institute for Museum and Library Services.

 

March

Overall

Public Domain Determinations

All Determinations

Public Domain Determinations

All Determinations

CRMS-US

865 1,352 170,239

321,945

CRMS-World

4,006 6,942 100,740 189,575

Total

4,871 8,294 270,979 511,520

Government Documents Registry

Project staff are made progress on an initial mechanism for conducting manual review of bibliographic records to determine whether and how the works described by the records are related (e.g. duplicates). The mechanism will be tested throughout April and May, and would be used especially in cases where the determination made during the automated relationship detection was of lower confidence.

HathiTrust Research Center Updates

The HTRC renamed its services environment to SHARC (Secure HathiTrust Analytic Research Commons) and released SHARC v3.1 on March 26, 2015.  The release was mainly aimed at bug fixes, but has a small set of new features. Changes to the main production service include:

  • Workset Listing by My Worksets and All Worksets.
  • Algorithm results have sortable Algorithm column and default order is by Time.
  • Email for user account registration now comes from sharc@indiana.edu
  • Data Capsule results are only available after a human review
  • API for Public Worksets, used by the beta version Bookworm (in the Sandbox) 

Changes to the Sandbox service include:

  • Initial alpha version of Bookworm
  • Updates to the alpha version Feature Extraction

We welcome community scholars to try the 3.1 release and provide feedback. Please send comments to htrc-tech-help-l@list.indiana.edu. Notes on the previous 3.0 releaseare available on the HTRC wiki.

HTRC staff created a LibGuide for the HT+Bookworm tool prototype.

For those interested in a more advanced deep dive on topic exploration within the HTRC Data Capsule, we will be offering a tutorial at JCDL 2015 on June 21 in Knoxville, TN. For more information, see http://www.jcdl2015.org/tutorials-workshops (Registration is now open!).

Development Updates


Development updates and activities by HathiTrust institutions included the following:

Full-text Search

  • Released code to take advantage of item-level date information for serials. The new date information will be available in HathiTrust search interfaces when the repository is re-indexed in April. 
  • Tested a Solr plug-in to reduce memory use in Solr 4. Testing found an order of magnitude reduction in memory use in the in-memory version of the “index to the index” for full-text search. The plug-in will be put into production during the next re-indexing job as well. This will allow an increase in the number of shards serving the index from 12 to 18 without additional hardware, an increase in the amount of memory available for OS I/O caching on the search servers, and facilitate future testing of relevance ranking.
  • Began work on a test framework for relevance ranking.
  • Created a prototype interface that displays search results from two different ranking algorithms side-by-side.
  • Fixed a bug in indexing code that introduced errors in the item information for some serials. The affected volumes were re-indexed to restore the correct information.

Handle Service

  • Transferred the Handle (persistent identifier) service to new servers.

Page Turner

  • Added language to identify works that are in the public domain but for which access is limited due to privacy concerns.
  • Deployed improvements to accessibility features with particular attention to support for new content types.
  • Updated the PDF generation process to use coordinate OCR information where it is available, allowing highlighting of search results in downloaded PDFs.

Recording Sources of Digital Objects

  • Implemented a new strategy for recording administrative information about digital objects, such as who digitized and deposited the content.
  • Implemented a single scheme for identifying institutions across repository systems in conjunction with this change.

Staff at the University of Michigan and the California Digital Library coordinated to harmonize the administrative information between the repository and bibliographic data management systems. 

Zephir

  • Released an API to facilitate use of bibliographic records managed in Zephir in workflows for content ingest.
  • Discussed strategies for improving tracking and reporting about records and digital items submitted for deposit in HathiTrust, and possible modifications to the workflow for processing record corrections. 
  • Discussed a preliminary draft of policies for handling requests for reporting from Zephir and Zephir system enhancements. 

Papers and Presentations


April Forecast


  • Begin production phase-in of high-performance storage system for full-text search
  • Release Solr plug-in to reduce memory use into production and begin the process of full-text reindexing.
  • Continue work on a test framework for relevance ranking, including interleaving of search results for the comparison of ranking algorithms.
  • Continue testing mechanism for manual bibliographic record relationship detection.

New Growth


As of April 1:

  March Overall
Boston College 294 3,557
Columbia University 0 73,396
Cornell University 9 515,753
Duke University 409 8,615
Emory University 128 180
Getty Research Institute 458 20,588
Harvard University 0 838,122
Indiana University 32 529,798
Keio University 2 90,122
Knowledge Unlatched 0 28
Library of Congress 0 108,892
McGill University 0 893
New York Public Library 6 304,610
North Carolina State University 0 3,196
Northwestern University 8 57,000
Ohio State University 5,431 47,525
Penn State University 27 389,247
Princeton University 0 252,841
Purdue University 0 47,488
Sterling & Francine Clark Art Institute 0 358
Texas A&M University 0 2,446
Universidad Complutense 12 117,322
University of Alberta 0 76,106
University of California 14,888 3,639,937
The University of Chicago 579 56,981
University of Connecticut 0 4,637
University of Delaware 0 48
University of Florida 0 9,866
University of Illinois 9,818 348,946
University of Iowa7,5517,551
University of Massachusetts, Amherst 0 12,007
University of Michigan 757 4,722,050
University of Minnesota 586 334,249
University of Missouri 0 1
University of North Carolina, Chapel Hill 0 17,025
University of Virginia 408 51,207
University of Wisconsin 0 561,534
Utah State 0 117
Yale University 0 23,832
Total 41,403 13,305,071

Public Domain (~37%)

Total*                                                                37,361 5,004,951

* Includes volumes opened through copyright review and rights holder permissions

Summary of Issues Received by User Support


Issue Type March 2015 February 2014
Content 148 227

Quality

144 211

Collections

13 12
Cataloging 154 164
Access and Use 130 157

Copyright

65 105

Permissions

9 18

Takedown

0 0

Print on Demand

0 0

Inter-library loan

2 2

Full-PDF or e-copy requests

29 12

Datasets

2 5

Data Availability and APIs

0 3

Reuse of content

3 6
Web applications 47 41

Functionality problems

27 25

Problems with login specifically

1 1

General Questions about Login

1 3

Partners setting up login

1 0

Usability issues

0 0

Feature requests

0 1
Partner Ingest 10 7
General 138 134

Partnership

6 8

Miscellaneous

132 126
Total 637 730

Most Accessed Volumes


Title
Quicksand, by Nella Larsen.
Solid Mensuration, by Willis F. Kern and James R. Bland.

The Human Figure, by John H. Vanderpoel

Roster of the Confederate soldiers of Georgia, 1861-1865, v.1.

History of wages in the United States from Colonial times to 1928, United States Department of Labor.

Roster of the Confederate soldiers of Georgia, 1861-1865, v.2.

Abstracts of old Ninety-six and Abbeville District wills and bonds, as on file in the Abbeville, South Carolina, courthouse.
The Five Laws of Library Science, by S. R. Ranganathan.
Godey's Magazine, v.40-41, 1850.
Roster of the Confederate soldiers of Georgia, 1861-1865, v.3.

Availability


Repository

Cumulative 12-month availability of repository access*: 99.971% (-0.001%). 

HathiTrust objects may have briefly been unavailable on Tuesday, March 24 from 15:45-15:50 ET through their handle.net persistent URLs due to a server misconfiguration. Staff immediately noticed and corrected the problem.

From February 24, 2015 through March 24, 2015, some users were denied access to volumes with a rights determination of “public domain only within the United States” due a bug in coding changes designed to test for accesses to HathiTrust from proxy servers. 

* Repository access refers to page viewing and full-text search functionality, i.e., user-facing applications. It does not refer to preservation or storage infrastructure, which is under continual operation.