Absent: Bernie Hurley
1. Approved SAB minutes for 18 February 2010 conference call
2. HT operations and development update—John Wilkin
- The University of California has almost finished the quality assurance checks for the ingest of Internet Archive material. If all stays on track, the digital files for 100,000 volumes should be loaded by the end of the month.
- The data API is proving to be a useful utility.
- Michigan has hired 1.7 FTE and has begun work on programming in support of ingest of non-Google content.
- Large scale search infrastructure will likely be extended to Indiana University in April.
“Surrogates” and “duplicates” are not the same even though they raise some overlapping issues regarding record keeping, data management, and display of digital files for specific works. John Wilkin provided a focused overview of the surrogate issue via email to the SAB (Revised HT SAB agenda, Thursday, March 18, 2010, distributed on 15 March 2010).
Two other attachments to that message discuss the duplicate issue in depth:
- “New Google Duplicate Detection and Return Procedure: Impacts on HathiTrust Bibliographic and Item Level Metadata ” J. Rothman, October 27, 2009
- “Google Designated Duplicates: Implications for HathiTrust End User Display ” Heather Christenson, California Digital Library, 2/11/2010 Version 3.0
Five percent of the database is estimated to be duplicate records. In order to keep focus on the discussion of surrogates, the topic of duplicates will be held for future discussion.
To date, there are perhaps 100,000 records that fall into the surrogate scenario. Such scale is small enough that there is no immediate crisis factor to force rapid resolution of the issue, but it is sufficiently large that we should methodically sort it out. In simple terms, the technical issue is to determine record-keeping details at the metadata level and to make decisions regarding what to tell the end user and when to display that information.
ACTION: Ed, Trisha, and Paul will collaborate on draft text for a clear and concise executive summary to provide focus for ongoing discussion and action. The probable next step will be to charge a group to work through the issue and to propose technical infrastructure to support record keeping and display.
4. Discovery Interface Working Group Update—John Butler
- OCLC began loading HathiTrust records into WorldCat in February. Quality checks, error remediation, and testing were successful. Full record loading began 9 March. More than 3,000,000 records should be online by the end of May 2010.
- Usability testing conducted jointly by the working group and OCLC usability staff with Penn State end users went well. A draft report should be forthcoming soon.
- SAB was reminded of the call for membership for the Discovery Interface Working Group.
5. Error Rate Working Group—Paul Soderdahl
6. Collections Working Group
Ed, John W, and Ivy Anderson (CDL) have discussed the proposed Collections Working Group and concluded that a standing “purpose” group might be more appropriate. Ivy is drafting a charge for SAB review.
7. CIC/UC meeting, 4-5 March
Representatives from CIC and UC with responsibilities regarding HathiTrust met in Oakland to discuss common interests and how best to work together. Topics discussed included possible organizational structures, potential services that would extend the value of the HathiTrust dataset, and communication, especially between the Executive Committee and the SAB.
8. Cost model
Discussion deferred until next conference call