Projects

 

HathiTrust is involved in a number of projects that are both internal and external to the repository and partnership.These include grant projects, working groups HathiTrust has assembled on various issues, and repository development opportunties that the partners will engage.

Grant Projects

  • National Science Foundation EAGER grant, in conjunction with Johns Hopkins University and CLIR (the Council on Library and Information Resources) to explore the feasibility of an open access repository for NSF-funded research.

Summary from the Update on September Activities:

Sayeed Choudhury of Johns Hopkins University, John Wilkin of the University of Michigan, and Amy Friedlander of the Council on Library and Information Resources (CLIR) are co-PIs in an NSF EAGER grant to determine the needs and requirements for developing an open-access repository for publications arising from NSF-funded research. The PIs will leverage Johns Hopkins’ experience in evaluating digital repositories, HathiTrust’s experience with large-scale infrastructure and ingest of digital objects, and CLIR’s experience and facility in bringing together groups of experts to determine next steps and directions on targeted issues. CLIR will host a series of workshops focusing on technical requirements, business and policy concerns, and organization and operations issues relating to the open-access repository. Johns Hopkins and HathiTrust will evaluate various technical systems based on the recommendations from the workshops. The creation of a sustainable, efficient, and scalable model to deliver the products of NSF-funded research to users at no cost will have a transformative impact on the dissemination and use of this valuable work.

  • Andrew W. Mellon Foundation grant led by University of Michigan professor Paul Conway.

Summary from the Update on October Activities:

With support from the Andrew W. Mellon Foundation, Associate Professor Paul Conway of the University of Michigan is leading a one-year research and planning project to find and test new procedures for validating the quality and usefulness of digital objects in HathiTrust. The short-term goal of the project is to prepare and submit a funding proposal to a federal granting agency to explore possibilities for validating these characteristics through manual and automated methods. The long-term goal is to develop criteria and methods to brand the trustworthiness of volumes in HathiTrust and other digital repositories for fulfilling specific purposes (such as reading, printing volumes on demand, performing computational research, and others). Such a branding or certification process would give assurance that content within a repository is worthy of preservation, and increase the value of that content in broader discussions about storage and management solutions for both digital and print collections. More information is available at http://blog.si.umich.edu/2009/09/28/mellon-grant-aids-researching-criter....

Working Groups

  • HathiTrust Discovery Interface Working Group   

In April 2009, staff from OCLC and HathiTrust institutions formed a working group to envision and implement a production-level bibliographic catalog for the HathiTrust collections. Members of the working group include Adam Brin (California Digital Library), John Butler (University of Minnesota), Bill Carney (OCLC), Suzanne Chapman (University of Michigan), Kevin Clair (Pennsylvania State University), Lee Conrad (University of Wisconsin), Lisa German (Pennsylvania State University), Julia Lovett (University of Michigan), Jon Rothman (University of Michigan), Christopher Walker (Pennsylvania State University), and John Wilkin (University of Michigan). Updates on the progress of this joint effort are available in the HathiTrust monthly newsletter beginning in April 2009. The catalog itself is targeted for release in April 2010. A vision for HathiTrust discovery beyond this initial release was produced by HathiTrust team members in November 2009 and is available at http://www.hathitrust.org/documents/hathitrust-discovery-vision.pdf

  • HathiTrust Research Center

During July and August 2009, individuals from multiple partner institutions assembled to explore the needs and requirements of establishing one of the Research Centers that is provided for in the Google Settlement. Through bi-weekly discussions and individual interviews, the group gathered specifications for a Research Center hosted by a HathiTrust institution, including specifications for a Center composed of HathiTrust data alone should the Settlement not be approved. These were submitted to the Executive Committee in October with a recommendation that a request for proposals based on the specifications be sent to interested HathiTrust partners institutions to build and sustain the Research Center. The recommendation was approved and following final revisions from the working group, the RFP is now available at http://www.hathitrust.org/documents/hathitrust-research-center-rfp.pdf.

Working group members include Steven Abney (University of Michigan), Jack Bernard (University of Michigan), Geoffrey Fox (Indiana University), David Goldberg (University of California Irvine), Robert McDonald (Indiana University), Qiaozhu Mei (University of Michigan), John Ober (California Digital Library), Beth Plale (Indiana University), Scott Poole (University of Illinois), Sarah Shreeves (University of Illinois), and John Unsworth (University of Illinois). The group is coordinated by Kat Hagedorn, with project support provided by Jeremy York from HathiTrust.

  • HathiTrust Collaborative Development Environment

Early on, partners identified a need to support distributed development of tools and applications in HathiTrust, both to serve the particular needs of local campuses and to improve centralized repository functionality and reporting capabilities. The Collaborative Development Environment working group was assembled for this purpose in June 2009. Working group members include Stephen Abrams (California Digital Library), Albert Bertram (University of Michigan), Lynne Cameron (California Digital Library), Kaylea Champion (University of Chicago), Stephanie Collett (California Digital Library), Steve DiDomenico (Northwestern University), Bill Dueber (University of Michigan), Mike Durbin (Indiana University), Phil Farber (University of Michigan), Paul Fogel (California Digital Library), Eric Hetzner (California Digital Library), Sebastien Korner (University of Michigan), John Kunze (California Digital Library), David Loy (California Digital Library), Andy Mardesich (California Digital Library), Mairéad Martin (Pennsylvania State University), Jon Miller (University of Chicago), David Minor (San Diego Supercomputer Center), Bill Parod (Northwestern University), and Cory Snavely (University of Michigan). Updates on the working group's progress are available in the HathiTrust monthly updates, which are distributed on the second Friday of every month. A list of all updates is available at http://www.hathitrust.org/updates.

  • HathiTrust Storage

As part of long-term preservation and disaster recovery planning, HathiTrust convened a working group in July 2009 to investigate the need for a third instance of HathiTrust storage in the western United States to complement current storage locations in Ann Arbor, Michigan and Indianapolis, Indiana. Members of the working group include Stephen Abrams, California Digital Library (co-chair), John Kunze, California Digital Library (co-chair), Luc Declerck, University of California San Diego, Rob Lowden, Indiana University, David Minor, University of California San Diego, and Cory Snavely, University of Michigan. The final report of the working group was submitted to the Executive Committee in January 2010 and is available here:

  • Quality, Ingest and Error Rate

In July 2009, the Strategic Advisory Board (SAB) assembled a working group to investigate issues surrounding the quality of partner institution volumes downloaded from Google. The working group was asked to research and provide recommendations on a quality threshold HathiTrust uses to limit ingest of poor quality volumes. The working group presented its recommendations to the SAB in January and the SAB decided to continue the working group with a revised and expanded charge.

The new charge is to a) develop a set of quality principles for HathiTrust, b) monitor quality control as related to user experience, c) track developments in a separate quality working group established by Google and Google library partners following the Google partner summit in October, and d) evaluate HathiTrust practices with regard to thresholding or limiting ingested content.

Development Opportunities

In the Update on October Activities, the first in a series of 'columns' was published outlining development opportunities or needs in HathiTrust that partner institutions have identified. Although reported in the updates, these will also be published below.

Ingest reporting

Description: The deposit of digital volumes and associated metadata into HathiTrust, referred to as “ingest,” involves a significant number of updates to administrative systems—bibliographic records added, digital volumes ingested, and access rights established. Many data elements will be of interest to the contributing institution, and each institution may drive local processes based on the current status of content in the repository (e.g., the percentage of in-copyright works may highlight the value of performing copyright determination work, or a low number of items available in the Google Return Interface may stimulate exploratory discussions with Google). A system that combines all of the available streams of administrative data into a simple web-based reporting system may have considerable value not only for transparency but also for local decision-making.

Resources available: Staff at the University of Michigan and the University of California have assembled a table of relevant data feeds with a brief description of each in the following document: http://bit.ly/2Jk5mm

Priority: moderate

Additional details: An institution that undertakes this work must:

  • outline a process for design and specifications with a group of interested HathiTrust partner libraries;
  • in consultation with partner libraries, give consideration to authentication and authorization needs for this system.

Usage reporting

Description: A clearer sense of the level of use of library materials in HathiTrust will help shape extended activities such as collection management and further digitization. Volumes in HathiTrust may, in some cases, be read in their entirety, while in other cases they may only be searched. To what extent are search-only materials viewed?  Which works that are fully viewable are displayed? Where does that access originate? As HathiTrust introduces authentication, to what extent do users authenticate to get access to a fuller array of services? How frequently is the HathiTrust catalog searched, and how does that use compare to the use of full text indexes? These are some of the questions that an improved service for usage reporting will  help to answer.

Resources available: HathiTrust retains raw log data and registers some uses through Google analytics.

Priority: moderate

Additional details: An institution that undertakes this work must:

  • clearly outline a commitment to undertake appropriate measures with regard to user privacy (e.g., with regard to IP addresses and, at such time that HathiTrust implements Shibboleth, user authentication information). Such efforts should include secure storage of sensitive data, appropriate aggregation of data so as to anonymize use by specific individuals, and a commitment to not transfer private user data to a third  party;
  • outline a process for design and specifications with a group of interested partner libraries;
  • give consideration to producing reports consistent with appropriate library community standards (e.g., COUNTER and SUSHI).

Upcoming Opportunities

  • Print holdings database
  • Ingest transformation