HathiTrust Updates

Update on December 2009 Activities

January 15, 2010 [Download PDF]Syndicate content

Top News           

Columbia Partnership – HathiTrust is very pleased to welcome Columbia University as its newest partner. A representative of HathiTrust will be travelling to Columbia in late January to give a full introduction to repository operations, current activities, and future plans. We look forward to the experience and expertise that Columbia will bring to the enterprise, and the new possibilities that are opening for HathiTrust as it continues to expand its membership and its collections. A full press release on the new partnership can be read at http://www.columbia.edu/cu/lweb/news/libraries/2009/20091216.hathi.html.

5 Million Volumes – A significant milestone was passed in December as HathiTrust exceeded 5 million volumes in digital holdings. More than 3/4 of a million of these are in the public domain. A steady rate of growth is expected  to continue in 2010, and partner collections are projected to grow to more than 8 million volumes.

TRAC Audit In early December, HathiTrust began a process with the Center for Research Libraries (CRL) to assess the digital repository in relation to the Trustworthy Repositories Audit and Certification (TRAC) criteria. The assessment is scheduled to proceed until mid-February, and the findings will be publicly available. More information about the audit can be found on the CRL website at http://www.crl.edu/archiving-preservation/digital-archives/certification....

Bib API HathiTrust has released a new bibliographic API that enables retrieval of descriptive and rights information for objects in the repository based on standard identification numbers (e.g., ISBN, ISSN, LCCN, OCLC). The API is a replacement for the (now deprecated) Rights API and the specification is available at http://www.hathitrust.org/bib_api.

Working Groups

Discovery Interface – OCLC is completing preparations for the import of HathiTrust data into WorldCat Local (WCL). The installation of a HathiTrust WCL instance is scheduled to be complete in late February, and loading of records into this first version of the joint catalog will begin in March 2010.  Looking towards version 2 of the catalog, the HathiTrust-partner working group began reviewing its scope and membership needs as its purview expands beyond bibliographic metadata in the catalog to include the integration of features such as full-text search and the HathiTrust Collection Builder. The group was renamed the HathiTrust Discovery Interface Working Group (from HathiTrust/OCLC Catalog) to reflect this broadening scope. The HathiTrust Executive Committee approved a proposal to have the working group report to the Strategic Advisory Board (SAB) in December, ensuring stronger alignment of the development and delivery of discovery services with future directions in HathiTrust as a whole.

Collaborative Development Environment Staff at the University of Michigan completed setup of one of the servers that will be used in the initial proof-of-concept partner development environment. The server is configured with all of the tools and software needed to support the PageTurner development that the University of California and Michigan engaged in collaboratively in 2009. A developer at UC has begun to test features of the environment and will be reporting and providing feedback to the working group when the full group is re-engaged in January.

Research Center The RFP produced by the working group was approved by the Executive Committee in December and is available on the HathiTrust website at http://www.hathitrust.org/documents/hathitrust-research-center-rfp.pdf.  

Ingest

Internet Archive Ingest – During the month of December, staff from UC and UM finalized many of the procedures and conventions related to the ingest of Internet Archive-digitized books into HathiTrust. These included file identification, preservation and technical metadata elements, content transformation and validation processes, error logging, and exception handling. UC delivered bibliographic metadata for an initial set of IA-digitized volumes to UM, and UM worked steadily on coding the transformation and validation processes for ingest. An end-to-end pilot test, including download, ingest, and quality review of ingested items will be performed in late-January. 

New Programmer For Non-Google Ingest – Applications are still being taken for a programmer to receive and prepare non-Google materials for ingest into HathiTrust. Review of applications and interviews are being conducted simultaneously. The bidding process will close in mid-January, but will be extended again if an applicant is not selected. Full-time and part-time positions are being considered, and it is increasingly likely that one of each may be filled.

Development Updates

Shibboleth In the near future HathiTrust will be implementing Shibboleth as a mechanism for inter-institutional authentication into HathiTrust. Distributed authentication will make it easier for users to take advantage of personalized services in HathiTrust, such as the Collection Builder. It will also enable the delivery of enhanced services to HathiTrust partner institutions. Staff at UM discussed the implementation strategy for Shibboleth in December and installed the Shibboleth service provider software on development servers to begin the work of integration. A forecast for the timeline of implementation will be included in the next update.

Large-scale Search – Staff at UM continue to refine the daily index update and release workflow, making it more resilient to problems that are sometimes encountered during indexing. New server equipment will soon be purchased for use at the Indiana site, and a schedule projected for continuous new hardware acquisition to maintain performance levels as the size of the index grows. As part of index and query response time testing, UM staff also updated and released a revised cache-warming procedure based on production log analysis. Warming (pre-populating) the cache of completed queries improves search performance.

Outages – There were no outages in December.

Partner News

(What is your institution doing with HathiTrust? Let us know!)

UC and SFX – A University of California group has started work on a project to demonstrate proof-of-concept success in exposing HathiTrust public domain books through UC’s UC-eLinks service (SFX). The project is investigating the various HathiTrust APIs capable of supporting this service, and in addition to gathering usage statistics for the new target, will report on the functionality, usefulness, and viability of each of the APIs for future endeavors. The target will eventually be made available to ExLibris so that it can be added to the SFX package for all customers, but will be available to HathiTrust partners who use SFX before then.  

New Growth

Number of volumes added:

  December Total
Indiana University 16,923
133,482
Penn State University2335016
University of California 263,089 1,155,367
University of Michigan 230,881 3,659,874
University of Wisconsin 12,137 267,353
Total 516,514
5,221,092
  • 41,006 public domain volumes were added in December, bringing the total number of public domain volumes to 758,947 (approximately 15% of total content).

January Forecast

  • Staff visit to Columbia
  • Begin Internet Archive ingest pilot
  • Discuss the development of a validation mechanism for repository content using the Data API
  • Begin to explore ingest and delivery of born-digital objects
  • Finalize a draft report and recommendation on a third instance of HathiTrust storage

Update on November 2009 Activities

December 11, 2009 [Download PDF]Syndicate content

Top News

Release of Large-scale Search Application – On November 19, HathiTrust launched a new service enabling full-text search of all volumes in the repository. Indexing of newly ingested volumes is ongoing, but the release of the first production index (containing approximately 4.6 million volumes) is the culmination of more than a year of research and benchmark testing conducted by staff at the University of Michigan. This new service dramatically changes the way researchers are able to use our collections and, along with the release of the bibliographic catalog in May, demonstrates HathiTrust’s commitment to providing sophisticated ways of accessing and using collections preserved in the digital repository. The official news release is available at http://www.ns.umich.edu/htdocs/releases/story.php?id=7426. More can be read about large-scale search in the Development Updates section below.

Development Opportunities – This month we provide the second in a series of ‘columns’ about development opportunities in HathiTrust. These are opportunities that have been identified by HathiTrust partners, and are available to HathiTrust partners, to create key systems or services that will benefit the partnership as a whole. Each month we will provide a brief description of one of these opportunities, give a sense of the level of priority that it has, and provide additional information about what might be involved in developing and supporting it. The opportunities are also listed on the HathiTrust website at http://www.hathitrust.org/projects. The opportunity described this month is usage reporting.

Usage reporting

Description: A clearer sense of the level of use of library materials in HathiTrust will help shape extended activities such as collection management and further digitization. Volumes in HathiTrust may, in some cases, be read in their entirety, while in other cases they may only be searched. To what extent are search-only materials viewed?  Which works that are fully viewable are displayed? Where does that access originate? As HathiTrust introduces authentication, to what extent do users authenticate to get access to a fuller array of services? How frequently is the HathiTrust catalog searched, and how does that use compare to the use of full text indexes? These are some of the questions that an improved service for usage reporting will  help to answer.

Resources available: HathiTrust retains raw log data and registers some uses through Google analytics.

Priority: moderate

Additional details: An institution that undertakes this work must:

  • clearly outline a commitment to undertake appropriate measures with regard to user privacy (e.g., with regard to IP addresses and, at such time that HathiTrust implements Shibboleth, user authentication information). Such efforts should include secure storage of sensitive data, appropriate aggregation of data so as to anonymize use by specific individuals, and a commitment to not transfer private user data to a third  party;
  • outline a process for design and specifications with a group of interested partner libraries;
  • give consideration to producing reports consistent with appropriate library community standards (e.g., COUNTER and SUSHI).

Working Group on Computational Research Center – Working group members provided final feedback on the Call for Proposals for a HathiTrust Research Center that will be distributed to HathiTrust institutions in December. The Research Center will make textual and image data in HathiTrust available for a wide variety of computational research and analysis purposes, including research in areas of digital humanities, linguistics, automated translation, and searching and indexing techniques.

Working Group on  Collaborative Development Environment –  Additional effort devoted to the release of Large-scale Search in November delayed further progress on the development environment, but it is now a prime area of focus. The first milestone, a preliminary proof-of-concept environment that supports current development efforts will be ready for developers at the University of Michigan and the University of California to begin testing in the first half of December. Once this milestone is reached, the working group will be re-engaged to discuss the current provisions of the development environment and explore next steps.

New Programmer For Non-Google Ingest –  Staff at the University of Michigan received and reviewed applications for a position to aid in the transformation and modification of non-Google content for ingest into HathiTrust. Five candidates were interviewed by phone in November and three were invited for in-person interviews. After a period of review, the search committee decided to continue the search and repost the position. An additional avenue, involving hiring one or more part time student employees operating with close supervision, is also being considered.

Internet Archive Ingest – Much progress was made toward the ingest of content digitized by the Internet Archive in November. The University of California shared specifications for a preferred set of files to be downloaded into HathiTrust with the broader community of Internet Archive digitization partners, and received constructive feedback from the group. In continued weekly calls, staff at UC and UM discussed procedures and conventions for content transformation, file-identification, and preservation and technical metadata, as well as error logging, exception handling, and policy issues surrounding the deposit of digital objects. The ingest team is working to have practices surrounding many of these issues finalized by mid-December, when UC will deliver bibliographic metadata for an initial set of IA-digitized volumes to UM. Once the transformation and validation processes for ingest have been finalized and coded, UM will conduct a pilot test, downloading and ingesting this initial set of volumes. It is hoped that the full pilot, including quality review of ingested volumes,   will be completed by mid-January.

Changes to Tab-delimited HathiTrust Metadata Files –  As of December 1, rights determination reason codes are included in the metadata files available for download at http://www.hathitrust.org/hathifiles. Please see the file specification at http://www.hathitrust.org/hathifiles_metadata for updated information.

Development Updates

Large-scale Search – The launch of HathiTrust’s large-scale search application was postponed in October in order to acquire additional hardware to accommodate new index growth. Due to a variety of factors including a delay in hardware delivery, staff at the University of Michigan altered their index storage strategy and reconfigured the Solr index servers at Michigan to use the Isilon storage system as a back-end. In addition to solving issues related to the size of the index, moving from existing direct-attached storage to the Isilon network-attached storage more readily accommodates the significant index growth that occurs during routine index optimization. The move to Islion is a temporary  strategy, however, and staff at UM will be investigating alternative options for storing the large-scale search index over the long-term. 

After the storage reorganization, a small backlog of indexing was completed and a new automatic daily indexing process was developed. The University of Michigan launched the full-text service in mid-November and it is performing well.

With an eye toward achieving full redundancy of the search service, staff at UM implemented a nightly synchronization of the index to the Indiana site. Work toward redundancy is ongoing, however, and will involve further research to determine the optimal size of index shards. The size of index shards will help to determine the optimal number of index servers to deploy to guarantee adequate search performance, as well as the additional server deployments and workflows needed to support continuing testing of the search system, routine indexing, and volume re-indexing. Once complete, additional equipment will be purchased and installed at both the Michigan and Indiana sites as appropriate to establish full redundancy.

In additional ongoing work, staff at UM performed analysis of post-release query logs to improve performance testing and cache warming.

HathiTrust/OCLC Catalog – On November 20th in Chicago, the HathiTrust Discovery Interface team met with the corresponding OCLC-WorldCat Local implementation project team for a productive visioning session of the HathiTrust catalog beyond version 1 due in April 2010. Each group shared its long-term vision for the project, and together began to identify areas of common interest and commitment for the year of work following the release of version 1. The HathiTrust team’s draft vision document is available for review and comment at http://www.hathitrust.org/documents/hathitrust-discovery-vision.pdf.

Ingest – The University of California sent shipments of bibliographic data from its Santa Cruz and San Diego campuses to the University of Michigan for ingest in November, totaling approximately 400,000 volumes. Ingest of these volumes, in addition to 200,000 more that are expected from UC’s North Regional Library Facility, will bring HathiTrust to more than 5 million volumes by the end of the year. UM received an initial shipment of bibliographic metadata from the University of Minnesota in November as well. As these and subsequent records from Minnesota are loaded in HathiTrust, ingest of the digital volumes will begin.

A lower number of new volumes were ingested into HathiTrust in November than expected because of a large number of volumes that were re-processed and made available by Google. Google continually re-processes images and OCR of volumes to make improvements and corrections, and these volumes enter a single queue with newly processed volumes for ingest.

Collection Builder –  Following the meeting with OCLC staff in Chicago, the focus of Collection Builder integration in the temporary catalog has shifted to integration in the full-text search application. This move sidesteps cross-site linking issues that were encountered, and will provide useful experience on which to build Collection Builder inclusion in the HathiTrust Catalog at a later time.

Outages – There were no outages in November. 

New Growth

Number of volumes added:

  November Total
Indiana University 32,427
116,559
Penn State University1084,783
University of California 105,864 892,278
University of Michigan 11,729 3,428,993
University of Wisconsin 12,511 255,216
Total 115,890 4,697,829
  • 15,980 public domain volumes were added in November, bringing the total number of public domain volumes to 717,941 (approximately 15% of total content).

December Forecast

  • Refine indexing methods, including frequency of complete index optimization and best index shard size
  • Develop processes for rebuilding the entire index
  • Finalize specifications for content digitized by the Internet Archive and prepare for ingest pilot
  • Add Collection Builder functionality to the HathiTrust full-text search interface

HathiTrust offers full-text search of millions of digitized books and journals

A year after its launch by 25 leading U.S. research libraries, HathiTrust Digital Library announces a service that will transform how researchers use the more than 1.6 billion pages (4.6 million volumes) in its collections.

The breakthrough allows for full-text searching capabilities across the entire library. Researchers can now search public domain and in-copyright works by keyword or phrase.

Based on open source Solr/Lucene technology, the service expands on an experimental search of public domain volumes introduced in November 2008. Full-text search will continue to be supported across the repository as it grows at a rate of hundreds of thousands of volumes every month.

"The HathiTrust partners are pleased to offer a search service that helps mine this growing body of authoritative library materials," said John Wilkin, HathiTrust executive director and associate university librarian at the University of Michigan. "HathiTrust continues to distinguish itself with its reliability and with its efforts to broaden the availability of digitized library collections in the flow of scholarly discourse. We see this valuable discovery service as one in a series of major steps HathiTrust is taking to shed light on this vast body of material."

In combination with the HathiTrust Digital Library's carefully curated bibliographic data, the new functionality allows researchers to more efficiently locate items relevant to their research. It also lays the foundation for future services such as full-text search with faceted browsing, advanced search, "more like this" options, and tools that can be used in computational research.

The effort to provide full-text searching capabilities across the repository has yielded valuable benchmarking data, methods, and code to the broader large-scale search community, said Wilkin.

The HathiTrust partners are committed to developing the repository and its services to meet the long-term needs of their academic communities, and offer a unique resource on the Web for scholarship and research.

HathiTrust (http://www.hathitrust.org) is a collaboration of the thirteen universities of the Committee on Institutional Cooperation, the University of California system, and the University of Virginia, and currently includes digitized volumes from the University of Michigan, University of California, Indiana University, and the University of Wisconsin.

Source: http://www.ns.umich.edu/htdocs/releases/story.php?id=7426

Update on October 2009 Activities

November 13, 2009 [Download PDF]Syndicate content

Top News

Development Opportunities – This is the first in a regular ‘column’ of development opportunities for HathiTrust. System and software development in HathiTrust is performed by contributions by HathiTrust partners. Although many HathiTrust systems and services must sit on central servers, our initiative relies on open systems and modularity, making it possible for partner institutions to develop key pieces of functionality. In this new column, each month we will provide a brief description of a system or service that has been proposed by HathiTrust partners, attempt to give a sense of the level of priority for that system or service, and provide additional information about what might be involved in developing and supporting it. These services will also be listed on the HathiTrust website at http://www.hathitrust.org/projects. This month we focus on an opportunity that has arisen directly from the expansion of HathiTrust day-to-day operations and the needs of new partners:

Ingest reporting

Description: The deposit of digital volumes and associated metadata into HathiTrust, referred to as “ingest,” involves a significant number of updates to administrative systems — bibliographic records added, digital volumes ingested, and access rights established. Many data elements will be of interest to the contributing institution, and each institution may drive local processes based on the current status of content in the repository (e.g., the percentage of in-copyright works may highlight the value of performing copyright determination work, or a low number of items available in the Google Return Interface may stimulate exploratory discussions with Google). A system that combines all of the available streams of administrative data into a simple web-based reporting system may have considerable value not only for transparency but also for local decision-making.

Resources available: Staff at the University of Michigan and the University of California have assembled a table of relevant data feeds with a brief description of each in the following document: http://bit.ly/2Jk5mm.

Priority: moderate

Additional details: An institution that undertakes this work must:
outline a process for design and specifications with a group of interested HathiTrust partner libraries.
in consultation with partner libraries, give consideration to authentication and authorization needs for this system.

Upcoming Opportunities

  • Usage reporting
  • Print holdings database
  • Ingest transformation

HathiTrust participates in grant from Mellon Foundation – With support from the Andrew W. Mellon Foundation, Associate Professor Paul Conway of the University of Michigan is leading a one-year research and planning project to find and test new procedures for validating the quality and usefulness of digital objects in HathiTrust. The short-term goal of the project is to prepare and submit a funding proposal to a federal granting agency to explore possibilities for validating these characteristics through manual and automated methods. The long-term goal is to develop criteria and methods to brand the trustworthiness of volumes in HathiTrust and other digital repositories for fulfilling specific purposes (e.g., reading, printing volumes on demand, and performing computational research). Such a branding or certification process would give assurance that content within a repository is worthy of preservation, and increase the value of that content in broader discussions about storage and management solutions for both digital and print collections.  

Google Summit – At a periodic meeting between Google and partner libraries, HathiTrust members worked with Google on issues related to the ingest of materials digitized by Google. Some topics discussed included strategies for improved metrics with regard to the quality of materials, and volumes rejected as duplicates from Google’s scanning workflow. The metrics discussed around quality could potentially be used to characterize or filter content that enters the repository (e.g., in the case of poor quality, to prevent ingest). The duplicate analysis conducted by HathiTrust partners is now being factored into Google’s continuing development of duplicate detection and return. Evaluators at the University of Michigan will continue to examine volumes returned as duplicates throughout the semester.

Working Group on Computational Research Center – The working group submitted its final report to the HathiTrust Executive Committee in October, containing specifications for a HathiTrust Research Center and a request for proposals from interested HathiTrust institutions to build and host the Research Center. The Executive Committee has reviewed the document and pending final edits from the working group, will distribute the RFP to the partner institutions in November.    

Working Group on  Collaborative Development Environment –Michigan staff observed a problem with a hard drive in one of the nodes in the development environment cluster and spent time in October troubleshooting the problem and investigating other potential options for hard drive configuration on the nodes. As a result of this investigation, the system BIOS on all nodes will be upgraded and one of the nodes will need to be rebuilt. Work continues on setting up a preliminary development environment on the first node.

New Programmer For Non-Google Ingest – Applications for a programmer position at the University of Michigan to aid in the transformation and normalization of content to be ingested from a variety of digitization sources have been received and reviewed.
UM has started the interview process and hopes to have the new programmer in place as soon as possible. The partners made the decision to centralize this ingest functionality initially in order to expedite the inclusion of non-Google content in the repository. Over time it is expected that individual partners will take a greater role in validating and preparing their content for ingest, leveraging tools and processes that result from this initial investment.

Internet Archive Ingest –  Weekly conversations centered on the ingest of content digitized by the Internet Archive continued in October between staff at the University of Michigan and University of California. Particular focus was placed on determining the standard identifier scheme that should be use for the content when it is ingested into HathiTrust. The University of California’s ARK identifiers, which exist for nearly all of its Internet Archive volumes, appear to be the most promising. Staff at UM have begun to test these identifiers in repository processes to detect any issues that may arise.

The University of California revised its set of preferred files to be downloaded from the Internet Archive for inclusion in the HathiTrust ingest package. The spec will be distributed to other IA partners in the near future for comments. UC also engaged in analysis of bibliographic data of IA-digitized files from its different campuses and continued development of an approach to authoritatively identify an institution’s volumes in the Internet Archive.

Upcoming Changes to Tab-delimited HathiTrust Metadata Files – As reported in last month’s update, beginning with the full metadata file produced on December 1, 2009, additional fields will be added to the tab-delimited HathiTrust metadata files that are provided at http://www.hathitrust.org/hathifiles (a description of the files is available at http://www.hathitrust.org/hathifiles_metadata).
Fields to be added include the copyright determination reason code and the date the database entry was last updated. With this data included, the tab-delimited files will become an ongoing accessible source for information on how and when rights determinations are made. The new tab-delimited fields will be added to the end of the current record structure in order to minimize any potential disruption for existing users of these files.

Development Updates

Large-scale Search – Staff at the University of Michigan successfully indexed all volumes in HathiTrust using the newly acquired hardware. However, the official launch of the large-scale search application was postponed in order to acquire additional hardware to accommodate new index growth. The original estimate of storage requirements turned out to be low once common-grams technology was introduced. Common-grams offer significantly better search performance but result in an increased index size. The very large number of volumes ingested into the repository in October contributed to the immediate need for more indexing space as well. Optimization of the index, a process occurring at regular intervals, requires as much as 3 times the size of the index shard being optimized.

Faceting of search results, a feature supported by Solr, was further explored in October. Faceting requires the addition of bibliographic data to the full-text index. A faceted index was built across two shards to look for potential problems in scaling. Early indications are that performance is only affected slightly with the facets employed.

HathiTrust/OCLC Catalog – After finalizing metadata requirements for the version 1 catalog in September, the HathiTrust/OCLC Catalog team turned its attention in October to interface requirements. The team is currently finalizing interface requirements for version 1 of the catalog and has agreed to engage in collaborative usability testing during the first quarter of 2010. Meanwhile, OCLC’s e-content synchronization work for HathiTrust remains on schedule, and is expected to be completed by the end of the calendar year.

Ingest – HathiTrust ingested a record 553,963 volumes in October. These included nearly 5,000 volumes from Penn State and initial loads of volumes from the University of California’s Santa Cruz and San Diego campuses. Ingest of volumes from Penn State will continue in November. Subsequent shipments of metadata for up to 600,000 additional volumes from UC campuses are expected in November. Ingest of these volumes will begin shortly thereafter.
Prototype for New HathiTrust PageTurner –  Enhancements to the HathiTrust PageTurner application and integration with the open source GnuBook were on hold in October as development efforts at Michigan focused on large-scale search and initial configuration of the collaborative development environment. The collaborative environment will enable staff at the University of California to fully test and troubleshoot GnuBook functionality in production conditions. Development of an “image API” is still needed to deliver page images from the repository for display in GnuBook.

Collection Builder – Michigan further explored integration of Collection Builder functionality into the temporary catalog search interface. Some difficulty was encountered due to cross-site linking restrictions, but options will continue to be explored.

Outages – There were no outages in October. 

New Growth

Number of volumes added:

  October Total
Indiana University 64,614
84,132
Penn State University4,6754,675
University of California 264,710
786,414
University of Michigan 206,283 3,417,264
University of Wisconsin 20,430 242,705
Total 553,963 4,535,190
  • 60,791 public domain volumes were added in October, bringing the total number of public domain volumes to 701,961 (approximately 15% of total content).

November Forecast

  • Fully deploy comprehensive full-text search
  • Continue to explore facets in full-text search
  • Continue to research solutions for adding Collection Builder functionality to the HathiTrust catalog search interface
  • Begin to develop HathiTrust METS specifications for content digitized by the Internet Archive
  • Begin preparations to conduct usability testing on the HathiTrust/OCLC catalog interface

Update on September 2009 Activities

October 9, 2009 [Download PDF]Syndicate content

Top News

HathiTrust participates in grant from NSF – Sayeed Choudhury of Johns Hopkins University, John Wilkin of the University of Michigan, and Amy Friedlander of the Council on Library and Information Resources (CLIR) are co-PIs in an NSF EAGER grant to determine the needs and requirements for developing an open-access repository for publications arising from NSF-funded research. The PIs will leverage Johns Hopkins’ experience in evaluating digital repositories, HathiTrust’s experience with large-scale infrastructure and ingest of digital objects, and CLIR’s experience and facility in bringing together groups of experts to determine next steps and directions on targeted issues. CLIR will host a series of workshops focusing on technical requirements, business and policy concerns, and organization and operations issues relating to the open-access repository. Johns Hopkins and HathiTrust will evaluate various technical systems based on the recommendations from the workshops. The creation of a sustainable, efficient, and scalable model to deliver the products of NSF-funded research to users at no cost will have a transformative impact on the dissemination and use of this valuable work.

University of Michigan Press Backfile and "Buy a Reprint" Links in HathiTrust – HathiTrust has begun ingest of the majority of the published backfile of the University of Michigan Press. More than 350 volumes are now available in the temporary catalog and the HathiTrust PageTurner, with an option to purchase print copies of many of the volumes in the PageTurner. The collection is the first of what is hoped will become many collections or bibliographies in HathiTrust that are maintained by official sources such as organizations, faculty, and librarians. The partners are still working on a name for these types of collections. More information about the Press partnership, including links to the official press release and the collection itself, are available at http://press.umich.edu/digital/hathi. Full-text search is available inside of the UM Press collection, and all other HathiTrust collections (see the Collection Builder home at http://babel.hathitrust.org/cgi/mb?a=listcs;colltype=pub).

Returned Duplicates — The University of California, the University of Wisconsin, Indiana University, and the University of Michigan have undertaken a review of volumes returned by Google as duplicates to better understand how duplicate determination takes place. During the month of September, staff members evaluated materials that were rejected by Google in August, identifying matches and potential mismatches. Results are currently being compiled and analyzed, and will be presented at the Google Partner Summit.

Working Group On Computational Research Center – The Research Center advisory group has completed their initial round of discussions on the demand, structure, content inclusion, legal considerations and funding of the Research Center. A report on that work will be submitted to the Executive Committee in the coming weeks. The group identified the need for additional strategies to gather specific information about the composition and ongoing use and support of the Research Center. A plan to assemble and incorporate this information should be in place in October as well.

Working Group on Storage – A series of teleconferences have led to the construction and refinement of a table defining the important decision criteria for adding a third instance of HathiTrust storage. By mid-October the group will develop a version of these criteria with institutional-specific weighting factors. It will then work to reconcile the weightings and develop a final recommendation.

Working Group on Collaborative Development Environment – Michigan staff have completed operating system installs on the initial development environment equipment. Staff will next configure one of the development servers with the base set of software required to support known demands on the environment, including shared development with staff at the University of California on the HathiTrust PageTurner. The initial configuration will be documented and discussed with the working group for further revisions and enhancements.

New Programmer for Non-Google Ingest – In the near future the HathiTrust partners will hire a developer dedicated to receiving non-Google materials from their respective institutions and preparing them for ingest into the repository. The new hire will speed the addition of these materials to HathiTrust and develop specifications and processes that will be applicable to content from new partners in the future.

Internet Archive Ingest — Staff members from the University of California, the University of Michigan, and the University of Illinois held a teleconference in late September to discuss the file formats for Internet Archive-digitized content that will be included in the HathiTrust book package. The partners are working to build consensus on a package that will meet the needs of all institutions contributing this content. The University of Michigan and University of California held two teleconferences in September to discuss issues surrounding ingest itself, such as book package identifiers and ways of preparing ingested OCR for use in full-text searching and viewing applications.

Upcoming Changes to Tab-delimited HathiTrust Metadata Files — Beginning with the full metadata file produced on December 1, 2009, additional fields will be added to the tab-delimited HathiTrust metadata files that are provided at http://www.hathitrust.org/hathifiles (a description of the files is available at http://www.hathitrust.org/hathifiles_metadata).

Fields to be added include the rights determination reason code and the date of last rights determination. With this data included, the tab-delimited files will become an ongoing accessible source for information on how and when rights determinations are made. The new tab-delimited fields will be added to the end of the current record structure in order to minimize any potential disruption for existing users of these files. More details on this change will be included on the website as they become available.

Development Updates

Large-scale Search Launch October 19 – In September, the University of Michigan worked to revise and debug production index-building routines to support a comprehensive index of HathiTrust volumes. This index is distributed across five servers with two Solr shards, or index fragments, on each server. In the process of running the routines it was confirmed that Logical Volume Manager (LVM) snapshots could be used effectively to deploy index updates. Concurrent testing of the indexes in the new search environment showed a significant improvement in performance over the current environment, as had been expected. The new full-text search service is targeted for release on October 19. When it is live, the full text of the more than 4 million volumes in HathiTrust will be searchable by anyone with a Web browser. At that time, a new portal interface will replace the current page at http://catalog.hathitrust.org, providing access to full-text search, bibliographic search, and linking to custom collections in the Collection Builder.

With the release of full-text search on the horizon, HathiTrust has begun exploring options for offering faceted browsing of content in conjunction with full-text search. The University of Michigan has built and performed preliminary testing on an index of 500,000 volumes that includes metadata suitable for faceting of search results. The tests suggest that the impact of faceting on full-text search performance will be tolerable in the new environment.

Principal developers for the open source Solr software integrated Michigan’s contribution of common-grams code into the Solr code base. It is now a permanent feature of Solr and, of course, the HathiTrust indexing process.

HathiTrust/OCLC Catalog – The HathiTrust/OCLC Catalog team recently reached an agreement on metadata requirements for the version 1 catalog. To finalize these requirements, input was sought from catalogers both within and outside of the regular group. The team is also in the process of finalizing user interface requirements. A face-to-face meeting between OCLC and HathiTrust is being planned for November, where the group will begin to lay out a vision and timeline for version 2 of the catalog.

Ingest – Ingest rates were low in September as HathiTrust remained caught up with content made available from Google, and due to an issue of metadata encoding that required Google to reprocess a number of volumes before they could be downloaded. Ingest of metadata from Penn State is expected to begin in October, with ingest of content to begin immediately after.

Prototype for New HathiTrust PageTurner — Staff at the University of Michigan investigated ways of altering the current process by which images are transformed for access, in order to produce images that can be used by the GnuBook. A conclusion was not reached and investigation will continue in October. Development work at the University of California will also continue in October, as staff prepare a feature that will allow users to view thumbnail images of the pages in a volume.

Collection Builder – As mentioned above, the UM Press volumes will form the first officially sponsored collection in Collection Builder. Another new feature of the Collection Builder, and the PageTurner application as well, is that users accessing HathiTrust from partnering institution campuses will see the name of their institution in the bottom left corner of the screen. This note will let users know that their institution is supporting the effort to make this content available and ensure its preservation over the long-term. Staff at the University of Michigan continue to work to integrate Collection Builder functionality into the temporary catalog. Negotiating authentication requirements between the two applications has introduced some complications, but options continue to be explored.

Outages – There were no outages in September.

Presentations

iPRES "HathiTrust: Preservation As A Platform For Collaboration and Expanded User Services", October 6 - Jeremy York, Suzanne Chapman, Heather Christenson, and Paul Fogel
PASIG "From Ingest To Access: A Day In The Life Of A HathiTrust Digital Object", October 8 - Jeremy York
NISO Forum "Seamless Sharing: NYU, HathiTrust, ReCAP and the Cloud Library", October 9 - Kat Hagedorn

New Growth

Number of volumes added:

September Total
Indiana University 1,036 19,518
University of California 64,210 521,704
University of Michigan 81,829 3,210,981
University of Wisconsin 7,230 222,275
Total 154,305 3,981,227
  • 36,400 public domain volumes were added in July, bringing the total number of public domain volumes to 641,170 (16% of the total content).

October Forecast

  • Create and maintain full-text indexing and search services in the new production environment.
  • Continue to explore the addition of facets in full-text search. Facets have introduced metadata to the full-text index, and therefore new sorting options, including weighted relevance, will need attention.
  • Continue to investigate potential solutions to the problem of dynamically serving images to GnuBook.

Update on August 2009 Activities

September 11, 2009 [Download PDF]Syndicate content

Top News

Working Group On Computational Research Center – The Research Center proposal planning group has made great progress in the last month. The group has continued discussions on the types of research that could utilize the centers, how results might be shared, and what environments/datasets are best suited to which types of research. In bi-weekly calls, subgroup meetings, and individual interviews, the team has been working through difficult issues such as defining non-consumptive research and recognizing hurdles related to the management and publication of research results. Next steps include developing a draft plan for the infrastructure of the centers and marrying legal and security restrictions with that infrastructure. The group aims to have a draft proposal prepared in early October and a full proposal completed later that month.

Working Group on Development 'sandbox' – Based on a general conversation with the working group and a useful discussion of potential use cases with UC staff during their Ann Arbor visit, staff at the University of Michigan have gathered enough information to start building the development environment. The initial goal is to support all of the current development projects in a single place, and provide a large subset of content with which to work. The new environment will be a substantial improvement over current conditions, and should be a building block for additional capabilities later on, including significant partner development. Michigan has racked, cabled, and started operating system installs on the equipment set aside for the project. When further progress has been made on the base installations the full working group will assemble to discuss the provisions of the environment.

University of Michigan Press Backfile and Reprint Purchase Links in HathiTrust – HathiTrust is collaborating with the University of Michigan Scholarly Publishing Office and the University of Michigan Press to open access to the majority of the published backfile of the UM Press in HathiTrust. The volumes, which are being digitized by the Press, will be available in HathiTrust with an option to purchase a print-on-demand copy in mid to late October.

HathiTrust Disaster Preparedness – Over the summer, an IMLS grant-funded intern in digital preservation performed an in-depth evaluation of disaster preparedness in HathiTrust. The report provides detailed information about the strengths of HathiTrust’s current disaster recovery planning, as well as recommendations for improvements in the short-, intermediate-, and long-term. It is available at http://www.hathitrust.org/technical_reports/HathiTrust_DisasterRecovery.pdf.

Prototype for New HathiTrust PageTurner — Staff from the University of California and University of Michigan held two teleconferences in August to discuss deeper integration of the UC prototype PageTurner into the existing application. Team members discussed strategies for offering full development capabilities on a limited amount of HathiTrust content in advance of the development ‘sandbox’ environment. A working strategy has been reached and a development space should be available in October. UC has continued in the meantime to improve GnuBook functionality with thumbnail views of page images and the ability to display full-text OCR. Staff at UM are investigating ways to alter current processes that make access-quality images available to the PageTurner, to produce images that can be used by the GnuBook.

METS Profile Available — Staff at the University of Michigan have created a version 1.0 METS profile for HathiTrust content, which can be downloaded at http://www.hathitrust.org/preservation. The profile currently applies only to Google content in HathiTrust, but will be updated to reflect requirements for locally-scanned content and volumes digitized by the Internet Archive.

Returned Duplicates — For several years, Google has been working on ways to reduce duplication in its digitization workflow. In August, it implemented processes that use metadata to detect volumes that have been scanned previously at other institutions so identical volumes will not be scanned again. The number of volumes rejected in this de-duplication effort has raised concerns among HathiTrust institutions about the accuracy of Google’s detection processes. The University of California, the University of Wisconsin, Indiana University, and the University of Michigan have undertaken a review of volumes returned as duplicates to better understand how duplicate determination takes place. The four universities have identified a target set of materials to review and are finalizing methodology to perform a manual evaluation. It is hoped that the results will be available for the Google library partner summit later this month.

Mobile Interface — Michigan made significant progress on the development of a mobile interface to the HathiTrust Catalog in August. The work continues, and staff will next turn their attention to the PageTurner application. Initial development will be followed by user testing for both applications.

Development Updates

Large-scale Search – After additional search performance testing in August, an improved index configuration was established by staff at the University of Michigan using a punctuation filter and a list of 400 common words (see blog post for details: http://www.hathitrust.org/blogs/large-scale-search/tuning-search-perform...). This index configuration will be put into production on the new dedicated server hardware, which was installed in August. Michigan also completed additions to the indexing control software (SLIP) to support distribution of indexing across several servers, each with multiple Solr index shards. A continuous indexing strategy for this distributed system and corresponding requirements for storage configuration and scripting has been implemented, and the first indexing tests will have begun by the time this report is published.

Ingest – The number of volumes ingested dropped significantly in August as ingest rates caught up with the rate at which partner content was made available from Google.

Data API – Ed Summers provided insightful and constructive feedback on the HathiTrust Data API in a blog posting in mid-August (http://inkdroid.org/journal/2009/08/13/open-to-view/). The comments are being reviewed by University of Michigan staff.

Collection Builder – Two new APIs for Collection Builder are being tested by staff at Michigan. The first returns the list of collections owned by a user. The second adds multiple items to a collection. These APIs will support future integration of Collection Builder functionality into other applications, such as the HathiTrust temporary catalog.

Outages – On Wednesday August 5 from 8:15pm to 9:30pm EDT, service was degraded (service may have been unavailable to some users) due to a storage system problem at the Indiana site. On Sunday August 23 at 6:30pm EDT to Monday August 24 at 8:00am EDT, Wednesday August 26 from 5:00pm to 6:00pm EDT, and Friday August 28 from 7:25pm to 8:35pm EDT, service was degraded due to network connectivity problems to database servers.

Software and firmware upgrades were performed during the weeks of August 10 and 17 at both sites without incident or interruptions in service. The upgrades conducted during the week of August 17 were preventative in nature, and addressed a hardware problem discovered by the storage system provider, and which was the underlying cause of the service disruption on August 5.

The cause of the other outages has been thoroughly researched but is still not known; workarounds that eliminate any service impact have been put into place, systems are being monitored, and investigation into the problem continues.

New Growth

Number of volumes added:

August Total
Indiana University -- 18,482
University of California 148,810 457,494
University of Michigan 58,878 3,129,152
University of Wisconsin -- 215,045
Total 207,688 3,820,173
  • 23,434 public domain volumes were added in July, bringing the total number of public domain volumes to 604,770 (16% of the total content).

September Forecast

  • Test large scale search performance on new dedicated server hardware.
  • Begin working with facets in large-scale search and continue testing performance variables including common-grams and punctuation.
  • Add reprint purchase links to the HathiTrust interface for UM Press items.
  • Continue development of mobile interfaces for the temporary catalog and PageTurner
  • Establish a collaborative development environment for the HathiTrust Page Turner.

Update on July 2009 Activities

August 14, 2009 [Download PDF]Syndicate content

Top News

UC Staff Visit Ann Arbor – HathiTrust project leads from the California Digital Library joined staff at the University of Michigan for two days of intense and fruitful discussion and planning from July 20-21. The teams consulted on a variety of forward-looking topics including a roadmap for the ingest of content digitized by the Internet Archive, strategies for future bibliographic metadata management, the challenges of providing help and feedback to users in a virtual library with multiple constituencies and stakeholders, HathiTrust PageTurner development, and creating infrastructure for collaborative development efforts. Several new planning efforts were initiated as a result of these discussions and both partners came away believing the visit had helped them to further coordinate efforts and was instrumental to continuing their successes in the future.

New HathiTrust Working Group On Storage – A new working group has been convened to explore the possibility of securing a third instance of storage for HathiTrust in the western United States. The working group members include Stephen Abrams, California Digital Library (co-chair), John Kunze, California Digital Library (co-chair), Luc Declerck, University of California San Diego, Rob Lowden, Indiana University, David Minor, University of California San Diego, and Cory Snavely, University of Michigan. If a third instance of storage is recommended, the group will investigate a variety of technical, management, and organizational issues involved in implementation.

Working Group On Computational Research Center – The Research Centers working group has been hard at work over the last month. The participants (please see the June update) have been engaging in a series of conference calls discussing issues related to the creation of the centers, including the types of research that will be done, the environment needed to support such research, and legal restrictions surrounding the use of the data. The group will continue to discuss these issues and others, such as funding sources and derivative research resulting from HathiTrust data use, in calls throughout August and September.

Working Group on Development 'sandbox' – The Development Environment working group convened for the first time in mid-July via teleconference to discuss the scope of the environment, the contexts in which development will occur (remote development versus local, specific use cases and desired features), and working group logistics. The group identified current applications such as the HathiTrust PageTurner and Collection Builder, and GROOVE, HathiTrust’s ingest mechanism as priority systems to be made available in the development space, and conferred about particular ways that work will be done, such as code versioning. The development environment was a focus of one of the sessions during the meeting between California Digital Library and University of Michigan staff mentioned above, where further discussion on these issues took place. In the coming weeks, team members at Michigan will prepare hardware that has been set aside for the project and do preliminary configuration of the environment on that hardware.

Prototype for New HathiTrust PageTurner — Collaboration between the California Digital Library and the University of Michigan to enhance the HathiTrust Page Turner with GnuBook functionality continued in July, primarily in the form of discussions about division of labor and the establishment of a basic collaborative work environment. A new planning and development team with staff from both institutions met in mid-August to kick off the next phase of GnuBook and PageTurner development.

HathiTrust-OCLC Catalog Project — The HathiTrust WorldCat Local Implementation team is nearing the completion of high-level requirements document for the version 1 catalog, with a target deadline of August 31, 2009. The team also began to document usability issues and suggestions for the proposed interface. OCLC has begun working on the e-content synchronization process that will bring HathiTrust’s records into WorldCat Local.

In striving to create a consistent user experience of HathiTrust, the team has turned to user feedback on the temporary beta catalog (http://catalog.hathitrust.org/).

HathiTrust Statistics — Member institutions have identified the need to make statistics about how HathiTrust is being used more broadly available within the partnership. As a provisional measure, access statistics gathered by Google Analytics are being provided to representatives at these institutions. While these analytics will be useful in the short-term, there is a need for a reporting tool that will provide more granular information, such as usage by institution and by format, in the future.

Development Updates

Large-scale Search – University of Michigan staff investigated the indexing problems with the beta large-scale search that were reported in the last update. The problems were due to a shortage of available memory. However, a decision was taken to wait for new hardware to be deployed before taking further action. The new hardware, purchased in June to support large-scale search, was received in July, and is currently being prepared for testing and use. With the new hardware in place, it is planned to have full text search of all volumes in HathiTrust by October 1st.

UM staff made refinements to the custom punctuation filter for large scale search, and ran tests only to discover the filter did not provide the performance boost anticipated. The punctuation filter has been set aside temporarily, but has potential for future implementation. Tests conducted by staff to compare response times for common-grams Solr indexes in various configurations resulted in a new emphasis being placed on the importance of a well-tuned list of common words. A new program that evaluates the total number of term occurrences for the most frequently occurring words in an index was created to aid in the selection of common words for this list. Additional details can be found on the HathiTrust Large Scale Search Blog (http://www.hathitrust.org/blogs/large-scale-search/). Four new posts were added to the blog in July.

Ingest – Ingest was slowed in July by the discovery that Google was making volumes available for ingest that did not contain the required descriptive metadata. Google addressed the problem and ingest continued as normal after these volumes were re-ingested.

Data API – University of Michigan staff responded to feedback received from California Digital Library on the Data API and discussion of the API continued when CDL visited Michigan. Key issues that have arisen are security and determining how much functionality should be built into the baseline API.

Collection Builder – Michigan explored solutions for integrating Collection Builder functionality into the temporary HathiTrust Catalog. Planned improvements would allow users to save multiple items to a public or private collection directly from a search results or bibliographic record listing in the catalog.

Outages – At 8:15pm EDT, Wednesday, August 5th, an incident (that we are currently investigating) at the Indianapolis data center caused HathiTrust storage at that site to be unavailable for 1 hour and 15 minutes. During that time the entire Ann Arbor node of HathiTrust as well as web servers at the Indianapolis node continued to be available for users. Our current load balancing and failover strategy does not adequately account for this sort of partial failure. In the worst case, a user whose browser was directed to the Indianapolis site may have been unable to view books in the repository during the period from 8:15-9:30pm EDT. For most users, however, load balancing would have directed their browsers to the Ann Arbor site during this period. In the coming year, we will be replacing mechanisms that currently handle load balancing and failover, and will devote attention to developing a more nuanced failover strategy.

New Growth

Number of volumes added:

July Total
Indiana University 601 18,482
University of California 109,403 308,648
University of Michigan 187,903 3,070,274
University of Wisconsin 3,707 215,045
Total 301,614 3,612,449
  • 47,028 public domain volumes were added in July, bringing the total number of public domain volumes to 581,336 (16% of the total content).

August Forecast

  • Establish a collaborative development environment for the HathiTrust Page Turner.
  • Test large scale search performance on new dedicated server hardware.
  • Begin working with facets in large-scale search and continue testing performance variables including common-grams and punctuation.
  • Work on enhancements to the HathiTrust interface, most likely in Collection Builder.
  • Work to integrate Collection Builder functionality with the Catalog.
  • Develop beta mobile interfaces for the Catalog and Page Turner to the point initial user testing can be conducted.

Update on June 2009 Activities

July 10, 2009 [Download PDF]

Top News

New Working Group on Computational Research Center – June was an exciting month for HathiTrust, both in terms of repository development and in terms of deepening collaboration among the HathiTrust Partners. Calls sent out in May for participation in two HathiTrust working groups were answered, and membership in both the Research Center and Development ‘sandbox’ groups was finalized. Members of the Research Center working group, which will develop a proposal for a Research Center to be created under the terms of the Google Settlement, include Steven Abney (University of Michigan), Jack Bernard (University of Michigan), Geoffrey Fox (Indiana University), David Goldberg (University of California Irvine), Robert McDonald (Indiana University), Qiaozhu Mei (University of Michigan), John Ober (California Digital Library), Beth Plale (Indiana University), Scott Poole (University of Illinois), Sarah Shreeves (University of Illinois), and John Unsworth (University of Illinois). The group will be coordinated by Kat Hagedorn, with project support to be provided by Jeremy York from HathiTrust.

Working Group on Development 'sandbox' – The Development ‘sandbox’ working group, which will work to create a development environment for partners to build and test repository applications and services, includes Stephen Abrams (California Digital Library), Albert Bertram (University of Michigan), Lynne Cameron (California Digital Library), Kaylea Champion (University of Chicago), Stephanie Collett (California Digital Library), Steve DiDomenico (Northwestern University), Bill Dueber (University of Michigan), Mike Durbin (Indiana University), Phil Farber (University of Michigan), Paul Fogel (California Digital Library), Eric Hetzner (California Digital Library), Sebastien Korner (University of Michigan), John Kunze (California Digital Library), David Loy (California Digital Library), Andy Mardesich (California Digital Library), Mairéad Martin (Pennsylvania State University), Jon Miller (University of Chicago), David Minor (San Diego Supercomputer Center), Bill Parod (Northwestern University), and Cory Snavely (University of Michigan).

Prototype for New HathiTrust PageTurner — As plans for the Development environment moved forward, the University of Michigan and California Digital Library (CDL) continued to explore possibilities for integrating the GnuBook reader into the current HathiTrust PageTurner to expand PageTurner’s features and capabilities. The California Digital Library created a prototype GnuBook-integrated page turner application with repository code and a sample volume made available by the University of Michigan. Staff at the University of Michigan are currently testing the functionality of the prototype and will work with CDL in July to determine the next steps for development. This collaboration is exciting not only because of the enhancements it will bring to the existing PageTurner application, but because it demonstrates the way that shared development will enhance the services and capabilities HathiTrust is able to offer.

CDL staff to visit Ann Arbor — Collaborating to enhance services and capabilities is the major theme of a visit that HathiTrust team members from the California Digital Library will make to the University of Michigan in July. Staff from both institutions will discuss a range of topics including the ingest of Internet Archive and other non-Google content, development of the HathiTrust PageTurner, communication about HathiTrust, and future development directions in a series of focused meetings from July 20th to 21st.

HathiTrust-OCLC Catalog Project — June was an important month for discussions about the HathiTrust-OCLC catalog, particularly regarding metadata and holdings information functions and display. At each juncture, the project team has prioritized meeting the unique needs of HathiTrust’s all-digital catalog while maintaining consistency across the entire WorldCat database. For example, the team recently discussed how to accommodate viewability levels (e.g., search only, full-text, or a mix of the two in multi-volume sets) that do not occur in any other WorldCat records. The team has also focused on strategies for displaying and faceting on HathiTrust’s many contributing institutions, in a way that would differentiate this information from print holdings.

In striving to create a consistent user experience of HathiTrust, the team has turned to user feedback on the temporary beta catalog (http://catalog.hathitrust.org/). Future months will see increased focus on display and interface concerns as well as functionality issues.

HathiTrust.org Website Reorganization — Due to the evolving nature of HathiTrust and the additional information it has been necessary to incorporate on the HathiTrust website, a comprehensive reorganization of the website was undertaken by staff at the University of Michigan. The website has an identical look and feel, but information about areas such as preservation, rights management, partnership, and access have been more clearly separated and defined to be easier to locate. As part of the changes, additional information about the requirements and benefits of becoming a partner, how to become a partner, and the costs of partnership have been added.

Strategic Advisory Board Meeting Minutes — The HathiTrust Strategic Advisory Board (SAB) met for the first time on June 17. The minutes of this meeting are posted on the HathiTrust website at http://www.hathitrust.org/sab. Future minutes of the SAB will be posted here as well.

Update on May 2009 Activities

June 12, 2009 [Download PDF]

Top News

Formation of Working Groups on Research Center and Development 'sandbox' – HathiTrust issued calls last month for names of participants in two new working groups: one to develop a proposal for a Research Center to be created under the terms of the Google Settlement, and one to create a development environment for HathiTrust partners to build and test repository applications and services. It is expected that membership of these two groups will be finalized in June.

Update on March 2009 Activities

HathiTrust
Update on March 2009 Activities
April 10, 2009

News:

  • General News

Update on February 2009 Activities

HathiTrust
Update on February 2009 Activities
March 13, 2009

News:

  • General News
    • Strategic Advisory Board – Significant progress has been made towards the formation and formal charge of a Strategic Advisory Board for HathiTrust. The Strategic Advisory Board will guide HathiTrust development efforts, convene task forces to address specific issues such as cross-institutional technology development and de-duplication, and develop policies for HathiTrust and its partners.

Update on January 2009 Activities

HathiTrust
Update on January 2009 Activities
February 13, 2009

News:

  • General News
    • Public ‘Discovery’ Interface and OCLC collaboration – We have begun active planning discussions with OCLC on the creation of a “catalog” for HathiTrust. Chaired by Lee Konrad (Wisconsin) and John Butler (Minnesota), this group will create specifications for adaptation of WorldCat Local (WCL) for HathiTrust. The deployment of the HathiTrust WCL interface is scheduled for early 2010, with work ongoing throughout 2009.

Update on December 2008 Activities

HathiTrust
Update on December 2008 Activities
January 9, 2009

News:

  • General News

Update on November 2008 Activities

HathiTrust
Update on November 2008 Activities
December 12, 2008

News

  • Deployment Status
    • Establishing Indiana mirror site – After a period of troubleshooting, network performance tuning, and a number of network hardware upgrades, data synchronization between Ann Arbor and Indianapolis was completed and routinized.
  • Development Update

Update on October 2008 Activities

HathiTrust
Update on October 2008 Activities
November 14, 2008

News

  • The Committee on Institutional Cooperation (CIC) and the University of California join forces to launch HathiTrust – On October 13th, the 13 libraries of the CIC and the 11 libraries of the University of California (UC) system jointly announced the launch of HathiTrust. The California Digital Library will coordinate UC’s participation. The University of Virginia joins HathiTrust as one of the first participants using the infrastructure built by the partners.

Update on September 2008 Activities

HathiTrust
Update on September 2008 Activities
October 10, 2008

News

  • HathiTrust web site – The HathiTrust web site, http://www.hathitrust.org/, continues to evolve. This month, we begin a process of providing more detailed reports on specific development initiatives (e.g., large-scale search). We have also added a dynamically updated report of the size of the HathiTrust repository, shown in the sidebar on the opening page of the website and on the Updates page.
  • Deployment Status

Update on August 2008 Activities

HathiTrust
Update on August 2008 Activities
September 12, 2008

News

  • Release of the HathiTrust web site – The HathiTrust released a web site at http://www.hathitrust.org/. The primary purpose of the website is currently to share information with partners and prospective partners, and to bring together technical documentation and resources like the HathiTrust API. In the future, we will work to develop the website into a mechanism by which users can explore the content in the HathiTrust repository.

Update on June 2008 Activities

HathiTrust
Update on June 2008 Activities
11 July 2008

Update on July 2008 Activities

HathiTrust
Update on July 2008 Activities
8 August 2008

Update on March 2008 Activities

Shared Digital Repository
March 2008 Update
11 April 2008

Update on April 2008 Activities

Shared Digital Repository Update on April 2008 Activities 9 May 2008 This is the second regular update on activities in the Shared Digital Repository (SDR). These updates will be made available monthly, typically on the 2nd Friday of the month, and will provide a variety of information about the general health of the repository and updates on the development of the SDR. Each update will be sent via e-mail to an official representative (typically the library director) of a participating institution, and will be posted on the SDR website.

Update on May 2008 Activities

Shared Digital Repository
Update on May 2008 Activities
13 June 2008

Syndicate content