December 12, 2008
- Deployment Status
- Establishing Indiana mirror site – After a period of troubleshooting, network performance tuning, and a number of network hardware upgrades, data synchronization between Ann Arbor and Indianapolis was completed and routinized.
- Development Update
- Large-scale Search – Work continued on benchmarking query performance on large-scale indexes with various hardware configurations. Memory was increased to 8GB and previous tests on indexes ranging in size from 50,000 to 1,000,000 documents were repeated. A summary of our approach and results from the first two (of five) stages of search benchmarking is available at http://www.hathitrust.org/technical_reports/Large-Scale-Search.pdf. Seeking finer granularity, we built and tested indexes ranging from 10,000 to 100,000 documents, but results simply confirmed what we had already discovered: query response time increases linearly with the size of the index. We also developed a script to automate the process of building an index of multiple shards that can be distributed logically across hardware to improve performance. This script was put immediately to work building 2-shard indexes, which are expected to be ready for testing the first week of December. [The public beta was released on November 4th. See http://babel.hathitrust.org/cgi/ls/.] Results of these tests, and others in successive stages of benchmarking, will be published at the URL above as they are completed.
- Future Strategies – As mentioned in the September update, we have embarked on a three-part strategy to facilitate development by, or in collaboration with, partner institutions. A summary of that strategy is included in the report on functional objectives. During November, progress was made on the API design specification. The API will provide an easy and reliable way for third parties to integrate HathiTrust content in to custom applications.
- 172,945 volumes were added in November.
- As of December 1st, the repository contained a total of 2,416,841 volumes.
- 23,013 public domain volumes were added in November, bringing the total number of public domain volumes to 367,355 (15% of the total content).
- Ingest of Wisconsin materials continued in November. As of December 1, 2008, HathiTrust contained 138,478 Wisconsin volumes.
Forecast for December development
- Complete web hosting infrastructure work at the Indiana site.
- Begin developing procedures for releasing web applications and synchronizing search indexes to the Indiana site, and begin adapting web applications and search functionality to function transparently across both mirrors.
- Benchmarking of query performance with large-scale indexes will continue in December. We plan to complete six specific tests that will compare indexes comprised of two shards in a variety of configurations using one and two servers and up to 32GB RAM. A specific test comparing direct and networked attached storage will also be conducted. Each round of tests provides insights on the pros and cons of various hardware configurations. The goal is to gather data sufficient to specify an architecture capable of providing public searching on the full text of at least ten million books.
- We expect to complete the first written draft of the API design specification in December.
PLEASE NOTE: Please contact Chris Butchart-Bailey (chrisbu at umich.edu) with email addresses of individuals or groups that should be added to our system outage mailing list.
We schedule system maintenance work that requires a system outage during time windows (in Eastern time) where academic user activity is generally lowest:
- For major work, Friday evenings (8pm-1am) and Sunday mornings (5am-10am);
- For minor work, weekdays from 6:30am-8am.
Advance notice for scheduled outages is given on business days and at least 24 hours in advance. Notice of unscheduled outages is given upon discovery, and additional updates are given as appropriate.
- Outages in November: On Tuesday, November 4 at 7:30am EST, HathiTrust was down briefly to apply security updates to a database server. Service was restored at 7:45am EST..
- Outages planned for December/January: A brief outage will be scheduled in December or January for a storage system software upgrade.