Update on February 2009 Activities
March 13, 2009
- General News
- Strategic Advisory Board – Significant progress has been made towards the formation and formal charge of a Strategic Advisory Board for HathiTrust. The Strategic Advisory Board will guide HathiTrust development efforts, convene task forces to address specific issues such as cross-institutional technology development and de-duplication, and develop policies for HathiTrust and its partners.
- Coordination between UM and UC Staff – In March, teams from the University of Michigan and the University of California continued a series of working meetings on HathiTrust technologies and infrastructure to prepare for ingest of UC content. These meetings have been positive and informative on all sides. Pending adjustments to current processes, including the abilities to incorporate bibliographic data and OCR with coordinate information in the ingest package, volumes from UC will begin entering the repository.
- Datasets – Extraction of volumes from the repository to form sample datasets for research is set to begin shortly. For a description of the datasets to be made available, please see the HathiTrust Long-term Functional Objectives .
- HathiTrust growth – Ingest rates increased slightly this month, at just over 150,000 volumes. This will increase with ingest of content from Indiana University and the University of California, which we hope to begin in late March or early April.
- Deployment Status
- Establishing Indiana Mirror Site – Web hosting infrastructure work was completed at the Indiana site, but a software problem introduced further delays. The problems have been resolved, and with all preparation nearly complete, we expect to bring the site online in March.
- Development Update
- Ingest - Work on adapting the ingest system to work with the new version of Google’s delivery system (GRIN) and to accommodate UC content is underway. Notable developments include accommodating differences in UC content packages (variance in file naming and the inclusion of bibliographic and coordinate data in the archival package) and the development of new ingest reports.
- Large-scale Search – Load testing was completed in February, with the exception of tests on smaller indexes to confirm limits on input/output operations per second. Strategies for hardware configuration and acquisition are now being explored.
- API – The draft functional specification for the HathiTrust Data API was discussed and refined but remains a work in progress. Implementation of the beta Data API is more than half done and will soon be shared for review by partners.
- Replication – Additional work was done to support replication of full-text indexing on the repository instance in Indiana.
- Public Discovery Interface - Work on the temporary public beta HathiTrust discovery interface is underway. As described in the short-term functional objectives, we are adapting VuFind software to provide a comprehensive bibliographic search interface for use on a temporary basis while specification and development work on the more permanent, WorldCat Local-based interface is underway. A mock-up for integration of this interface with existing Collection Builder, Page Turner, and beta full-text search interfaces has been created, and functional modifications/customizations of the software are in progress.
- 152,235 new volumes were added in February.
- As of March 1st, the repository contained a total of 2,650,188 volumes.
- 23,350 public domain volumes were added in February, bringing the total number of public domain volumes to 402,883 (15% of the total content).
- Ingest of Wisconsin materials continued. As of March 1, 2009, HathiTrust contained 159,314 Wisconsin volumes.
- Forecast for February development
- Complete web hosting infrastructure work at the Indiana site and put it into active service.
- Continue adapting ingest routines to accommodate UC content.
- Finish load testing for large-scale search and release updated report on search benchmarking. Continue to investigate ways to improve performance for slow queries.
- Continue work on the HathiTrust Data API specification and gather input from a broader audience. Continue coding the initial Data API implementation.
- Continue development of the temporary public beta of a comprehensive bibliographic search.
PLEASE NOTE: Please contact Chris Butchart-Bailey (chrisbu at umich.edu) with email addresses of individuals or groups that should be added to our system outage mailing list.
We schedule system maintenance work that requires a system outage during time windows (in Eastern time) where academic user activity is generally lowest:
- For major work, Friday evenings (8pm-1am) and Sunday mornings (5am-10am);
- For minor work, weekdays from 6:30am-8am.
Advance notice for scheduled outages is given on business days and at least 24 hours in advance. Notice of unscheduled outages is given upon discovery, and additional updates are given as appropriate.
- Outages in February: On Sunday, February 22 at 8:40am EST, a power surge resulting from electrical system maintenance caused HathiTrust database and web servers to go offline. Staff learned of the problem at approximately 6:00pm EST, and service was restored by 6:30pm EST.
- Outages planned for March/April: No outages are planned at this time.