Update on October 2008 Activities
November 14, 2008
- The Committee on Institutional Cooperation (CIC) and the University of California join forces to launch HathiTrust – On October 13th, the 13 libraries of the CIC and the 11 libraries of the University of California (UC) system jointly announced the launch of HathiTrust. The California Digital Library will coordinate UC’s participation. The University of Virginia joins HathiTrust as one of the first participants using the infrastructure built by the partners.
- Center for Research Libraries Audit – The Center for Research Libraries and HathiTrust are developing plans for an independent assessment of the HathiTrust repository, based largely on the Trusted Repositories Audit and Certification (TRAC) criteria. The scope, depth and timing of the assessment will be determined within the next few months.
- Deployment Status
- Establishing Indiana mirror site – In October, Indiana IT staff readied data center space for our second instance of storage. Michigan staff disassembled and shipped the six pallets of equipment from Ann Arbor and coordinated an on-site visit with Indiana IT staff and the storage vendor to bring it online. Several days later, HathiTrust started the data synchronization process. The next step is to configure the web and index servers so that once data synchronization is routinized, the mirror site can be put into service.
- New Storage – The storage system in Michigan was upgraded from 100 TB to approximately 190 TB with new equipment ordered in September. The upgrade was performed without disrupting access to the system. The same amount of new storage was delivered to Indiana and brought online as part of the previously mentioned site work. Both sites now have approximately 190 TB of usable capacity online.
- Development Update
- Large-scale Search – As outlined in the Large-Scale Search report of October 10th (http://www.hathitrust.org/large_scale_search ), we began benchmarking search results for a set of 5,000 queries in indexes sized in regular increments, from 50,000 documents to 1,000,000 documents. October work included: completing index building, establishing a query set, running all queries for each index with 4GB of RAM and then 8GB of RAM, and producing preliminary data. As expected, these steps have helped to define potential strategies for indexing and providing access to millions of volumes. A public report on memory-related benchmarking will be available in late November or early December. Also, as planned, we made significant progress on producing a public beta of a full-text search mechanism for all of the public domain documents in the HathiTrust repository. [Note: the public beta was released on November 4th. See http://babel.hathitrust.org/cgi/ls/. ]
- Future Strategies – As mentioned in last month’s update, we have embarked on a three-part strategy to facilitate development by, or in collaboration with, partner institutions. A summary of that strategy is included in the report on functional objectives. During October, preliminary API design discussions took place at Michigan. These discussions focused on specifying baseline functionality and logical isolation of repository services.
- 340,226 volumes were added in October.
- As of November 1st, the repository contained a total of 2,243,896 volumes.
- 24,748 public domain volumes were added in October, bringing the total number of public domain volumes to 344,342 (15% of the total content).
- Ingest of Wisconsin materials continued throughout October. As of November 3rd, HathiTrust contained 132,949 Wisconsin volumes.
Forecast for October development
- Complete synchronization of Ann Arbor and Indianapolis sites.
- Continuation of full text research and increased activity in the area of page turner API development.
PLEASE NOTE: We still do not yet have contact email addresses for institutions for notification. As the service becomes more widely used, this will be an essential means of communication. Please contact Chris Butchart-Bailey (chrisbu at umich.edu) with email addresses of individuals or groups that should be added to our system outage mailing list.
We schedule system maintenance work that requires a system outage during time windows (in Eastern time) where academic user activity is generally lowest:
- For major work, Friday evenings (8pm-1am) and Sunday mornings (5am-10am);
- For minor work, weekdays from 6:30am-8am.
Advance notice for scheduled outages is given on business days and at least 24 hours in advance. Notice of unscheduled outages is given upon discovery, and additional updates are given as appropriate.
- Outages in October: No Outages.
- Outages planned for November/December: No outages are planned at this time.