January 9, 2009
- General News
- Public ‘Discovery’ Interface and OCLC collaboration – The HathiTrust partners have concluded a preliminary agreement with OCLC to collaborate in adapting OCLC’s WorldCat Local as a public discovery interface for HathiTrust, and through a process led by members of partner libraries, will work with OCLC on specifications, with a goal of deploying this interface in early 2010. We will continue to move forward with a temporary public beta using VuFind, to be deployed early 2009. More information on this strategy is available in the Functional Objectives.
- Decrease in HathiTrust growth – After very fast growth, including two successive months of more than 300,000 volumes being added to the repository, the rate of ingest has dropped dramatically. This month we added fewer than 100,000 volumes. We are largely caught up with content available until we begin ingesting content from the University of California system, which will begin soon.
- Deployment Status
- Establishing Indiana Mirror Site – The Indiana site is now a fully operational mirror of the Michigan site, and a complete tape backup also exists in Michigan. We continue work on configuring the Indiana site to be a parallel web hosting environment with the Michigan site, using load balancing and failover.
- Development Update
- Large-scale Search – Solr indexes of various sizes up to 1 million documents were tested against six hardware configurations, including multiple shards on individual machines, and individual shards on multiple machines, with RAM configurations ranging from 8GB to 32GB. As planned, we completed these tests in December and are now working on data analysis. We continue to work toward a goal of being able to specify the hardware and software required to support full text searching (with Solr) of all volumes projected to be in the repository.
- API – We have completed a rough outline of the HathiTrust content access API. As part of this work, we explored API security issues in depth and identified the most likely solutions.
- Bibliographic Data Distribution – The University of Michigan OAI data provider now reflects the fact that we are providing records from HathiTrust Digital Library (rather than mbooks, which was previously the case). We have communicated this change to major harvesters of the sets. We have modified the MARC and oai_dc formats to correct and amplify the information we are providing based on feedback solicited from those who have harvested us in the past. For instance, the 245 field now includes the statement of responsibility (subfield c). For more information see http://www.hathitrust.org/bibliographic_data_distribution.
- 61,030 new volumes were added in December.
- As of January 1st, 2009, the repository contained a total of 2,477,871 volumes.
- 4,730 public domain volumes were added in December, bringing the total number of public domain volumes to 372,085 (15% of the total content).
- Ingest of Wisconsin materials continued. As of January 1, 2009, HathiTrust contained 142,488 Wisconsin volumes.
- Forecast for January development
- Complete web hosting infrastructure work at the Indiana site and put it into active service.
- Results of the most recent large-scale search test will be evaluated and reported publicly. Next steps will be identified and more tests (e.g., on load) will be conducted. Possible areas of exploration include additional variations on hardware configuration and shard distribution, the impact on performance when faceting is employed, and indexing optimizations to improve performance.
- Server software will be enhanced to support replication of full-text indexing on the repository instance in Indiana.
- We hope to complete the first written draft of the API design specification in January.
PLEASE NOTE: Please contact Chris Butchart-Bailey (chrisbu at umich.edu) with email addresses of individuals or groups that should be added to our system outage mailing list.
We schedule system maintenance work that requires a system outage during time windows (in Eastern time) where academic user activity is generally lowest:
- For major work, Friday evenings (8pm-1am) and Sunday mornings (5am-10am);
- For minor work, weekdays from 6:30am-8am.
Advance notice for scheduled outages is given on business days and at least 24 hours in advance. Notice of unscheduled outages is given upon discovery, and additional updates are given as appropriate.
- Outages in December: On Friday, December 19 at 7:30am EST, HathiTrust was down briefly to apply security updates to a database server. Service was restored at 7:40am EST.
- Outages planned for January/February: A brief outage will be scheduled in January for a storage system software upgrade.