Strategic Advisory Board Meeting Minutes - January 14, 2010

HathiTrust Strategic Advisory Board
Conference Call
Thursday, January 14, 2010
2:00 PM – 3:30 PM (Central)
Participating: Ed van Gemert (chair), Sarah Pritchard, Bernie Hurley, Bruce Miller, John Butler, Jeremy York (for John Wilkin), Patricia Cruse, Paul Soderdahl (recorder). Guest: Paul Fogel.
1.     Approved SAB minutes for 11/24/09 conference call
2.     HT operations and development update, Jeremy York
a.     Shibboleth. Shib SP (Service Provider) service installed on development servers. Expect to have in production sometime in Q1’10. Addendum: In response to questions from the Board about any relationship between HT’s Shib SP and MONK’s Shib SP (specifically for CIC institutions), Jeremy followed up after the meeting that they have not had contact to this point with Mike Grady at UIUC. HT is still in preliminary stages of installation. They have been in contact with UMich’s Scholarly Publishing Office, which is also running a Shib SP (for UK users). John Wilkin added that HT Shib planning has involved CIC CIO IdM group leaders and information sharing with John Ober at CDL.
b.     Born digital content. They have been working with UMich’s Scholarly Publishing Office. They have some files from UM Press. Preliminary thoughts are to adopt ePub as the preferred format for delivery.
c.     Non-Google/non-IA digitized print content. UMich still in the process of hiring a programmer to work on this. Still accepting applications, but will close the search very soon. Goal is to hire one full-time programmer and/or other part-time developers. Addendum: Jeremy clarified after the meeting that this will be UMich staff funded by UMich.
d.     Non-print content. Still in exploratory phase. Looking at an audio pilot and an image pilot (primarily, maps) to learn some about the metadata specifications, grouping, workflows, etc.
e.     Data API.  Building interface to demonstrate use of the API (e.g. content validation purposes). Creating such an interface will also help finalize API specifications. Currently using the API for a UMich project in Islamic manuscripts. See
f.      Bibliographic API/Rights API. Formerly, the rights API returned a single identifier and a binary rights status. Rights functionality extended to return values for multiple items and enhanced rights attribute. Working on further developing API to receive an HT record number and return MARC XML.
g.     Collaborative development environment. Building a proof-of-concept implementation of the dev environment to bring to the WG. Planning to use the page turner for the proof of concept. Sebastien Korner (UMich) is lead.
h.     HT research center. The RFP that was drafted by the partner institutions last summer was approved last month by the Executive Committee and will be posted on website soon.
i.      Paul Conway Mellon grant on validation of digital objects. Have assembled an advisory board for the grant. Ed will represent SAB. Bruce mentioned that they were asked to contribute to the cost share but will be unable to do so. Sarah mentioned that the CIC Library Directors were similarly approached and most had the same reaction. Sarah reported there is still some talk of whether CIC itself might have something to offer.
j.      Feasibility Study for Open-Access NSF Publication Repository. NSF-funded planning grant for Johns Hopkins, Michigan, and CLIR to study feasibility of open access repository for NSF-funded research. Meeting planned at end of January to discuss technical specs. Addendum: Jeremy clarified after the meeting that the NSF planning grant may evolve into a full NSF proposal.
k.     TRAC audit. Responding to requests for documentation from CRL. In some cases, HT documentation describes current practice rather than strictly policy document. Responding to questions regarding quantifying partner institutions’ contributions have proven difficult. Jeremy is putting together a framework document describing roles.
l.      Internet Archive ingest. Finalizing specs. Expecting to dot pilot ingest of UC volumes by end of January.
m.   Large-scale search. Refining indexing process. As index grows, optimizing the index requires increasing resources and will impact schedule for acquiring new hardware. Currently working on best practices for indexing.
n.     Discovery interface. Planning to move beyond phase 1 is still on track. See separate agenda item below.
3.     Update on the error rate working group, Paul Fogel
a.     Report and recommendations from the Error Rate and HT Ingest WG attached.
b.     Paul reported that the new “crap detector” seems promising as a replacement for the error rate metric because it incorporates manual review which should better reflect end user experience. Google will be adding the new metric to GRIN to allow HT to evaluate.
c.     Board members discussed several issues at length, including:
                                               i.     Quality control as it affects user experience;
                                             ii.     Concerns about rejecting any Google content, regardless of quality;
                                            iii.     Acknowledging that the 15% threshold may be a close match to the original 3%, but concern that the original 3% value was arbitrary at the time and based on a flawed metric;
                                            iv.     Implications of gating-at-ingest on non-Google content.
d.     Board consensus was to affirm maintaining the status quo of 15% until a review of the new metric and the progress of the multi-institutional working group.
e.     Board consensus also was to extend the WG and ask the group for an expanded report on the advantages and disadvantages of not gating and to recommend a set of quality principles.
ACTION ITEM: Ed will work with Paul Fogel to recast the WG’s charge in light of the Board discussion.
4.     Update on the discovery interface working group, John Butler
a.     Due to time constraints, John will submit report in writing. See report below.
5.     Discussion—volume duplication within HT, Ed van Gemert
a.     Discussion postponed.
b.     Addendum. After the meeting, Ed writes: “I was planning on making a pitch to the SAB to establish a group to evaluate the issues of de-duplication of files within HathiTrust. John Wilkin had laid out a draft charge for the review of duplicates in the ‘Dear Ed’ letter of 27 April, 2009. I would like for us to have this discussion at an upcoming meeting. As part of our implementation of a new resource discovery tool at Wisconsin, we’ve done some of the work that would be needed. We think we could assist a larger effort. I'll make my pitch at a later date.”
6.     New business: ingest reporting
a.     Discussion postponed.
b.     Addendum. After the meeting, Ed writes: “Wisconsin is looking at what is required to outline a process for designing an ingest reporting system. We intend to consult with and work with any interested HathiTrust partner libraries. Our programmers are initiating this work now.”
7.     New business: extend access to restricted resources for users with print disabilities
a.     Discussion postponed.
b.     Addendum. After the meeting, Ed writes: “The CIC library directors will be discussing all of the issues of expanding accessibility services to HT files for users with print disabilities. That discussion will need to include an understanding of costs and timelines, a consideration of several options or approaches for getting the work done, and figure out how best to include identity management staff, legal services, and access services staff from around the CIC. If we decide to tackle this project we would invite and plan to collaborate with any interested HathiTrust partner libraries.”
Adjourned: 3:45pm CST.
Update on the discovery interface working group, John Butler
Discovery Interface Working Group Report

The OCLC development team, with which HathiTrust is collaborating, is completing software development for the HathiTrust data load into WorldCat Local (WCL) and is scheduled to install in late February.  Loading records into the HathiTrust WCL version 1 will begin in March 2010.

Looking towards version 2 of the HathiTrust catalog, the working group has been synthesizing the roadmap discussion it had with the OCLC development team at its joint meeting in Chicago in late November.  There is basic agreement on the goals for development in FY2011, but some concerns about whether the approach OCLC is asserting will provide the "openness" that HathiTrust requires.  Concurrent with those discussion, the working group is beginning to review its scope and the membership it will need as the group’s purview expands beyond bibliographic metadata, to include the integration of other features such as full-text search and the HathiTrust Collection Builder into the user experience. 

Reflecting this broadening scope, the group was renamed: HathiTrust Discovery Interface Working Group.  Also, the HathiTrust Executive Committee approved the proposal to have this working group report to the Strategic Advisory Board (SAB) to ensure stronger alignment between the development and delivery of discovery services with the overall direction of HathiTrust.