Navigation

Update on June 2014 Activities

July 11, 2014 Syndicate content

[Download PDF]

Top News


Save the Date: HathiTrust Member Meeting

The HathiTrust bylaws passed in 2013 call for “an Annual Meeting of the Members...for the transaction of such business as may come before the meeting.”  We are pleased to announce that our first Annual Meeting will be held in Washington, DC on Friday October 10, 2014.  

We expect the meeting to include progress reports on ballot initiatives, official business, and opportunities to discuss future strategy. More details on location, schedule and agenda will be forthcoming in the next few weeks.  For now we ask all official Member Representatives to plan to attend this meeting.  If a representative cannot attend, a designate may attend in his or her place. 

Ingest


Locally-digitized content

HathiTrust ingested a second batch of locally-digitized content from the University of Illinois and prepared to ingest materials from Boston College. HathiTrust also began conversations about ingest with Penn State University and Yale University, and continued communications about ingest with Emory University, University of Illinois at Urbana Champagne, and University of Washington.

Internet Archive-digitized content

HathiTrust began ingest of content from McGill University (see http://bit.ly/1xSm5Aq) and corresponded with University of Massachusetts, Amherst about ingest of new materials.

Google-digitized content

Many volumes scanned from partner institutions by Google in the last year were not ingested due to a change in a quality metric provided by Google that HathiTrust uses to create thresholds for content that enters the repository. In June, HathiTrust updated its use of the metric to restore the quality threshold for Google-digitized content to its previous level. The update will eventually bring more than 200,000 new volumes into the repository. 

Projects


Copyright Review

A summary of the determinations from HathiTrust copyright review activities in June is given below. See CRMS-US and CRMS-World for further information.

 

May

Overall

Public Domain Determinations

All Determinations

Public Domain Determinations

All Determinations

CRMS-US

215 315 165,340 314,270

CRMS-World

3,996 7,268 59,652 117,369

Total

4,211 7,583 224,992 431,639

Government Documents Registry

Project staff continued to develop strategies to identify and make relationships between publications based on bibliographic information. This included work on rules to normalize descriptive terms and enumeration and chronology information, and rules to merge records. Staff continued to investigate methods to identify gaps in metadata, and began to think more concretely about how to engage the community in efforts to identify gaps and duplicate volumes.

Development Updates


Authentication and Authorization

Staff deployed the new system for managing users who have special access to restricted materials (e.g., for copyright or quality review). The system includes functions to register new users for specific time frames, renew access with appropriate authorization, and automatically expire access, as well as back-end scripts for individual and batch renewal or expiration.

Full-text Search

The software update that is expected to resolve performance and stability problems with the high-performance storage system for full-text search was delayed, and staff continued regular communications with the storage supplier on its availability. In the meantime, staff made improvements to the new daily index update process that is currently running in a test mode on the new storage system to more smoothly handle the large data updates that occur when the search index is fully rebuilt.

Staff investigated the suitability of the INEX 2007-2010 test collections to inform choices about relevance ranking algorithms for HathiTrust full-text search. 

Tom Burton-West wrote the second in a series of blog posts: “Practical Relevance Ranking for 11 Million Books, Part 2: Document Length and Relevance Ranking”.

PageTurner and Image Server

Staff prototyped new imgsrv capabilities for continuous text (e.g., JATS encoded articles without page breaks) in PageTurner, demonstrating in-article search.

Server replacement cycle

Staff began installation of new full-text search servers. The servers are tentatively planned to be put into service in July.

Availability

Cumulative 12-month availability: 99.867%

No outages were reported in June.

New Growth

As of July 1:

  June Overall
Boston College 0 3,197
Columbia University 128 65,165
Cornell University 33,857 487,762
Duke University 0 7,774
Harvard University 630 238,065
Indiana University 416 196,082
Keio University 1,124 90,080
Knowledge Unlatched 5 24
Library of Congress 1 108,883
McGill University 893 893
New York Public Library 4 291,794
North Carolina State University 0 3,196
Northwestern University 18,754 56,398
Ohio State University 3,007 26,859
Penn State University 285 81,492
Princeton University 212 251,925
Purdue University 0 44,698
Sterling & Francine Clark Art Institute 32 358
Texas A&M University 0 1,201
Universidad Complutense 2 112,153
University of California 20,514 3,520,634
The University of Chicago 12,459 51,630
University of Delaware 9 28
University of Florida 0 9,866
University of Illinois 6,600 142,899
University of Massachusetts, Amherst 0 11,115
University of Michigan 16,823 4,689,072
University of Minnesota 303 120,180
University of North Carolina, Chapel Hill 0 17,025
University of Virginia 377 51,202
University of Wisconsin 1,151 557,252
Utah State 0 117
Yale University 0 23,678
Total 117,586 11,262,697

Public Domain (~34%)

Total*                                                                92,066 3,848,472

* Includes volumes opened through copyright review and rights holder permissions

Summary of Issues Received by User Support

Issue Type June 2014 May 2014
Content 168 131

Quality

157 124

Collections

10 7
Cataloging 163 285
Access and Use 188 142

Copyright

125 88

Permissions

6 6

Takedown

0 0

Print on Demand

0 0

Inter-library loan

2 0

Full-PDF or e-copy requests

2 17

Datasets

0 3

Data Availability and APIs

3 3

Reuse of content

3 4
Web applications 18 18

Functionality problems

7 8

Problems with login specifically

2 1

General Questions about Login

1 0

Partners setting up login

1 0

Usability issues

0 0

Feature requests

2 1
Partner Ingest 4 7
General 86 93

Partnership

7 7

Miscellaneous

79 86
Total 627 676

Most Accessed Volumes

Title
Advanced accounts; a manual of advanced book-keeping, by R.N. Carter
The Human Figure, by John H. Vanderpoel
Consumption of the Lungs and Kindred Diseases, Treated and Cured by Kerosene, by Charles Oscar Frye.
Hortus gallicus pro Gallis in Gallia scriptus... / Symphoriano Ca[m]pegio ... authore; [Analogia medicinarum indaru[m] et gallicaru[m]
Investigation of the Ukrainian Famine, 1932-1933: report to Congress, by the Commission on the Ukraine Famine.
Liberty bell, a collection of original poems by Abraham Lewis
The Book of a Hundred Hands, by George Brant Bridgman.
Roster of the Confederate soldiers of Georgia, 1861-1865, v.2.
Roster of the Confederate soldiers of Georgia, 1861-1865, v.3.
Roster of the Confederate soldiers of Georgia, 1861-1865, v.5.

July Forecast

  • Correct a bug in navigation of  large scale search results.
  • Continue work on new Image Server capabilities for continuous text content.
  • Reassess accessibility features of PageTurner with particular attention to supporting new content types.
  • Improve processes for building and indexing collections, and improve sorting of serial publications in the Collection Builder application. 

Papers & Presentations

Partner Presentations