Updates are provided in relation to the milestones listed at HathiTrust Research Center Timeline and Deliverables .
Progress: HTRC is working with a 50,000 volume collection of materials digitized from the IU library and a 250,000 volume collection of non-Google digitized content. Both collections reside at IU stored on a NAS (Network Attached Storage) unit and are regularly synchronized with the main HT collection at Michigan using the Unix tool rsync.
Progress: HTRC received a 3-year grant from the Alfred P. Sloan Foundation to build this prototype system. The system will prove experimentally and theoretically that it is possible to comply with the non-consumptive constraint in computational research. It will serve the community as a platform for development, testing, and execution of new algorithms developed by the broad research community capable of running at scale on the HathiTrust corpus. This research involves Atul Prakash of the University of Michigan.
Progress: HTRC staff set up a development portal for HTRC. The portal is built using the Lift framework . A key aspect of the portal implementation is its support for InCommon identity and access management, which enables a user user to log in using their home university credentials. The portal is consequently more secure because HTRC does not need to manage identity itself, and users also benefit from the inCommon management tool as they are not required to remember another user ID and password.
Progress: In this first phase HTRC is working on setting up core infrastructure components including the portal, InCommon sign-on, a service registry, Solr indexes, file system and database storage for the collections. Staff are also working on infrastructure for user-created collections and experimenting with text-mining techniques for improving descriptive metadata across the collections. Finally, a set of 60,000 rules developed with the aid of domain experts is being applied to correct OCR errors across the collection.
Progress: HTRC staff demonstrated SEASR running against a small HTRC collection at the Digital Humanities Conference June 2011 using a collection of 50,000 volumes from the Indiana University collection in the HathiTrust. The content was prepared for this use by flattening the HT internal pairtree and converting bibliographic data to RIS format (http://www.refman.com/support/risformat_intro.asp). Future work includes integrating SEASR into the HTRC portal infrastructure by supporting InCommon identity and access management. Other projects currently in progress include scaling to access a large remote data collection and ensuring algorithm integrity against the copyrighted collection particularly in the face of user's ability to rewire workflows at will.
Progress: HTRC has chosen the InCommon framework for trustworthy shared management of access to on-line resources. Researchers have single sign-on convenience using their existing credentials at their host organization, which eliminates the need to create additional accounts. InCommon uses Shibboleth or another SAML-compliant software to exchange attributes with partners, providing only the information necessary to do the authentication and authorization. The InCommon Federation provides the policy and technical framework that makes all of this possible. As of a recent count, all but 15 of the members of HathiTrust are members of the InCommon Federation. We anticipate that membership will grow to 100% of HathiTrust members.
Progress: An official kick-off of the HTRC was held at the Digital Humanities Conference in Palo Alto, CA June 20, 2011. The HathiTrust Research Center team has given 7 presentations to other various groups and conferences.
Progress: HTRC has co-sponsored three grant proposals with institutions inside and outside the HathiTrust partnership community.
Progress: HTRC has met with Project Bamboo  on multiple occasions in continuing discussions.
Progress: The Alfred P. Sloan Foundation award is a step towards sustainability. We are working on a long-term sustainability plan.