May 8, 2009 [Download PDF ]
Temporary Beta Catalog Released – A major milestone for HathiTrust was reached in April, as a temporary beta catalog for HathiTrust was released on April 24. The catalog provides bibliographic search and faceted browsing of all volumes in HathiTrust, and integrates with the HathiTrust Page Turner to provide access to individual items. It can be accessed at http://catalog.hathitrust.org . Further integration with the HathiTrust Collection Builder, as well as other enhancements, are planned for a second phase of development. This catalog (including phases 1 and 2) is temporary, pending the release of the permanent HathiTrust catalog to be developed by OCLC in conjunction with HathiTrust partners (see the final news item on this page for details). Michigan and California also discovered a strong mutual interest in improving functionality in the HathiTrust Page Turner, which will be explored further in May.
Ingest Started From Indiana University And The University of California — In a month of significant developments for HathiTrust, ingest of content from both Indiana University and the University of California began in April. The loading of bibliographic metadata for the initial set of Indiana volumes was completed and approximately 10,100 had been ingested by May 1. Bibliographic loading continues for the University of California, and ingest started late in April. Several hundred volumes are now available in the repository.
HathiTrust-WorldCat Local Project — In April, the HathiTrust-WorldCat Local Implementation project team, consisting of members from HathiTrust libraries and OCLC, met in Chicago to begin the process of creating a production-level bibliographic discovery interface for HathiTrust. The initial version, due out in the 1Q 2010, will build on WorldCat Local’s standard functionality and will be tailored to HathiTrust’s entirely digital collection. The design of HathiTrust’s recently released temporary beta catalog (using VuFind) will also inform this project’s requirements and interface design. Some of the project’s overall priorities will be:
Storage – New storage that was purchased in March has been installed and is operational at the Michigan site. The storage at Indiana has been received, and installation is scheduled for mid-May.
Large-scale Search – Michigan and California developers shared experiences and ideas in a fruitful discussion about Lucene-based search engines, XTF and Solr. Investigations into software solutions for improving response times for slow queries led us to add common-gram indexing and searching capabilities to Solr, significantly improving performance of slow phrase queries. Common-grams increase index size, but the difference so far seems to be manageable and worthwhile. We are continuing to refine a hardware configuration to use for Solr servers based on discussions of indexing workflows and continuing research of different indexing algorithms, which have an impact on storage requirements.
Data API – Useful feedback was received from California Digital Library staff on the first draft of a functional specification for the HathiTrust Data API and a response is in the works. The draft is online (http://www.hathitrust.org/hathitrust_data_api ). Coding of an alpha version the Data API is done, and limited use of the API will start in May; CDL will use it to validate ingest of UC content into the repository.
Development Environment – We are in the early stages of conceiving a new development environment for building and testing repository applications and services. Server hardware has been allocated to this purpose and setup will take place as design discussions progress.
PLEASE NOTE: Please contact Chris Butchart-Bailey (chrisbu at umich.edu) with email addresses of individuals or groups that should be added to our system outage mailing list to receive information about unscheduled outages.
We schedule system maintenance work that requires a system outage during time windows (in Eastern time) where academic user activity is generally lowest:
Advance notice for scheduled outages is given on business days, at least 24 hours in advance. Notice of unscheduled outages is given upon discovery, and additional updates are given as appropriate.