Strategic Advisory Board Meeting Minutes - May 20, 2010

HathiTrust Strategic Advisory Board
Conference Call minutes, Thursday, May 20, 2010
2:00 PM – 3:30 PM (Central time)

Sarah Pritchard, recorder.

1. Review and approve SAB minutes from 4/15/2010.  Minor spelling corrections were noted.

2. HT Operations/Development update.  John Wilkin reported.  Major news items include:

  • Shibboleth protocols will be launched by the end of May.  Authenticated users will be able to download full books and use the Collection Builder.
  • The second instantiation of large-scale search is about to come up at the Indiana site.  This will help ensure that there are no outages for this service.
  • There are now 1 million public domain volumes in HT, and 6 million volumes overall.  There are some minor differences between HT and Google Books as to what is in the public domain; SAB recommends this be explained in the FAQ.
  • NYPL is becoming a partner in HT and will retrospectively load all of the files from their several years of Google scanning.  The SAB proposes to pursue more in-depth discussions in the coming months about how to incorporate new partners effectively into governance and committees.

3. Committee charges:  Charges and reporting were clarified for new committees on Communication, and on User Research.  John sent updated drafts.  The charge and membership for the Communications Working Group were subsequently approved by the Executive Committee and can be found online at

4. Update on the Mellon workshop at Northwestern about digital humanities text initiatives: Sarah Pritchard reported on this one-day discussion meeting led by Martin Mueller, NU English Dept.  The focus is on strategies for curating digital corpora of humanities texts (especially Early Modern literature), and the possible use of a crowdsourcing approach to correcting and annotating things like the works in the ESTC.  The workshop looked at the content and approaches of ESTC, the NINES project, HathiTrust, TCP, and relevant computer science research projects.  Sarah participated, and briefly updated the group on HathiTrust's ongoing work in areas of 1) quality control, and 2) subsetting.  A key need identified by the workshop is for collaborative editing tools; and for better metadata and text linkages among corpora and relevant bibliographical listings.  The ESTC may contact HathiTrust in this regard.  A report will be forthcoming shortly. 

5. Working Group Updates:

  • Discovery interface:  John Butler.  Three areas of progress –

1) HathiTrust WorldCat Local (HT-WCL) Record Loads -- WorldCat now contains over 2.5 million records for HathiTrust titles

2) The HT-WCL software installation scheduled for May 16 was delayed for additional testing. OCLC is working to identify the earliest date possible to move forward with the install.

3) The HathiTrust Full-Text Search Services Assessment and Development effort is being launched as a subgroup of the working group.

  • Error rate:  Paul Soderdahl.

1)    Error Rate WG began drafting a "gating scenarios" document, identifying pros and cons of each scenario and projecting related development requirements and magnitude of costs for each. Current scenarios being explored: (a) gating at ingest, (b) gating at access, (c) no gating but disclose QC info to users, (d) no gating at all.

2)    Now working on refining the various gating scenarios and developing a principles framework document. Will need to discuss applicability, if any, to non-Google objects. At some point, will need help getting a sense of efficiencies and costs for gating at ingest vs. access.

3)    Question to SAB: Is HathiTrust intending to allow localized (partner-specific) decisions and practices or will all partners need to abide by the same rules?

  • Google surrogates:  Bernie Hurley, Ed Van Gemert.  A draft working group charge was discussed, along with a helpful chart of the issues prepared by Paul Soderdahl.  The SAB will recommend to the HT Executive Ctte. that this group move ahead,
  • Collections Committee:  Sarah Pritchard, John Wilkin.  The charge has been revised and will be sent to SAB for a final review and then on to the Executive Ctte.; names were discussed for appointment to the group.  There was discussion of the nature of this as a standing committee, not just a working group, and what the implications of that are more broadly for HT governance.