HathiTrust and Discovery

June 24, 2011

By John Wilkin, Executive Director, HathiTrust

It is a core tenet of HathiTrust that preservation cannot take place without access. The coupling of preservation and access is both philosophically and strategically central to HathiTrust’s mission, as awareness of the materials in our collections helps to create the value that leads to preservation. And because discovery is integral to access, HathiTrust has worked hard on a multi-pronged strategy for discovery.

Key to this strategy are our ongoing efforts to ensure that HathiTrust content is “in the flow” of library discovery more generally, as illustrated by our recent agreement to integrate the HathiTrust full text indexes into the Summon discovery service, and our collaboration with OCLC to create a permanent bibliographic catalog for HathiTrust.

The catalog as a tool for collection management

HathiTrust serves two primary constituencies: librarians as collection managers, and scholars and other users of our collections. This may seem like an artificial distinction—the lines between these two types of users and their discovery methods are often blurred, with bibliographically astute users wanting to look through the lens of the catalog, and reference librarians exhibiting some of the most sophisticated source-intensive research skills. Nevertheless, a central part of the work of libraries, and particularly the partner libraries, is collection management, and HathiTrust has as part of its design (both in its mission and goals) seamless integration into collection management strategies.

To best serve librarians as collection managers, a well-designed catalog is a critically important tool. A well-designed catalog for a collection manager always offers bibliographic precision. It allows the librarian to know (and find) exactly what is held, and also how that holding—that bibliographic instance—relates to other similar holdings. As we move into large-scale collection management across many of our cooperating libraries, this kind of well-designed catalog will play a critical role.

When HathiTrust launched its enterprise, we provided an extremely popular “temporary beta” catalog based on VuFind. It sported tremendous features like faceted results and the ability to sort results by date and rankings. It was well-received and reliable. At the same time, we announced a partnership with OCLC to build a replacement for this temporary beta, which we expect to launch sometime this year. Why replace the VuFind-based catalog, which works so well? Situating HathiTrust’s holdings in the larger OCLC WorldCat database is a tremendous boon to librarians in understanding what we have online, how the collections of the partner institutions relate to each other, and how those online holdings connect to libraries around the world. By managing HathiTrust’s records in the same place that other libraries do, we are better positioned to perform collection analysis and to shape future strategies to close gaps. In short, working with OCLC to build the HathiTrust catalog is an important strategy with regard to our collection management goals.

But should the creation of an effective catalog with OCLC cause us to abandon other bibliographic discovery strategies? Absolutely not. HathiTrust works in a number of ways to distribute bibliographic information to partners and the world. Our APIs allow libraries to add URLs to their catalogs where their library has a matching record. Our OAI distribution of brief records makes it possible for many libraries and other bibliographically-oriented entities to add records for materials unique to their collections. And the hathifiles, an inventory of HathiTrust holdings now numbering approximately 9 million lines, can help drive institutional processes to identify materials and shape more sophisticated record-oriented strategies. And of course OCLC’s efforts to load information about HathiTrust holdings is also a boon for libraries wishing to get records from OCLC. The creation of a catalog is critical, but does not by itself fulfill users’ needs to find records in other discovery venues.

Full text discovery and support of scholarship

HathiTrust’s full text strategy is very similar to its bibliographic discovery strategy, though it flips the paradigm a bit. After an extraordinary research and development effort, HathiTrust launched a full text search service in 2010, and ever since then we’ve been working to chart a course for a better, more sophisticated service. This summer (2011), we will launch a new full text search service that will incorporate fuller bibliographic information in the full text, use facets, and offer other features such as weighting of results depending on where the results were found in a text. And of course this will only be one more step in a process of continual enhancement.

While HathiTrust believes the catalog function must be in OCLC, where libraries already manage their records, we also insist that the full text service must be in HathiTrust, where the materials are managed. Therefore we will focus increasingly on the standalone HathiTrust full text search service as a vehicle for end-user discovery. As such, it will always work to distinguish itself from the services offered by Google and other commercial services by enabling scholars to search for information precisely and exhaustively. Appealing as it is, Google Search’s lack of precision and complete recall can be a hindrance to much scholarly work, and here HathiTrust must step up. After all, our collection of content is different from Google’s (with our locally-digitized content and content that comes from partnership with other large-scale digitization initiatives), and our academic orientation ensures that our search results are not influenced by a connection with commerce, such as advertising.

Just as our our OCLC strategy does not end our pursuit of other bibliographic discovery strategies, our decision to mount a robust full text search service in HathiTrust does not eliminate the need to ensure discovery elsewhere. Because so much of our content is in Google Book Search and the Internet Archive, we achieve this goal in part without much additional effort. Still, much of the content in HathiTrust is only accessible in HathiTrust, and so getting in the flow of our users’ discovery methods (particularly users of academic and research library collections) is very important. By making the HathiTrust indexes searchable in Summon, we begin to accomplish this. Although Summon is the first and best of these services, the marketplace will produce others, and we remain committed to ensuring that our content is discoverable in as many of these services as possible. Negotiations are underway with Summon’s competitors, and press releases will follow as we conclude these agreements.

That which can be found is more likely to be preserved

In order to effectively support its preservation mission, HathiTrust must constantly improve the discovery experience and must seek to situate discovery wherever our users search for information. “Either/or” strategies are bound to fail us. Indeed, we will continue to implement a range of discovery strategies in collaboration with all appropriate partners and in every appropriate location. Our strong connection to scholars will lead us to refine the approaches we take to discovery, and our knowledge of where they seek information will guide the approaches we take to distributing records and making our full text indexes available. By making the information we store as discoverable as possible, we stand the greatest possible chance of having that information found, valued and preserved.