Navigation

Federal Documents Registry Progresses to Beta

August 23, 2016

We are pleased to note that the HathiTrust US Federal Documents Registry is now available as a beta release.

The Registry is intended to be a comprehensive source of metadata for the US federal documents corpus - material produced at government expense since 1789. While many potential use cases exist, an important use will be the identification of materials that have not yet been digitized and/or deposited into the HathiTrust repository.

The Registry was conceived in 2012 as a mechanism to determine how far  HathiTrust had progressed in meeting its goal of a comprehensive digital corpus, as outlined in the ballot initiative from the 2011 Constitutional Convention. In the fall of 2013, we issued a broad call for records, and thanks to the more than 40 libraries who responded we received  more than 25 million records. With such a large aggregation of records, the project team needed to develop multiple approaches for detecting and grouping duplicate records (records describing the same work).

The Registry was launched as a public alpha in June 2015, and since then the team has worked to steadily improve duplicate detection, Registry infrastructure, and the accessibility of the interface. We determined that the following factors needed to be in place before we declared the Registry a beta release:

  • The Registry database should reside in a full production environment (not a development environment);

  • The presence of a “Registry record” with a persistent, unique identifier for distinct works;

  • Regular metadata updates from the HathiTrust Digital Library;

  • The interface should meet the same accessibility standards  that other HathiTrust interfaces meet.

Now that the Registry has reached this milestone, the project team’s focus will shift to gap detection in the HathiTrust digital collection. We will continue to reduce the number of duplicate records in the Registry, while also looking to identify and fill gaps in Registry metadata. Staff will also be using Registry data and HathiTrust members’ holdings data to identify materials to be digitized.

The Registry team thanks the University of Michigan’s LIT department for support over the past year; in particular, the Architecture and Engineering (infrastructure), Design & Discovery (accessibility audit), and Library Systems (data) departments.

More information on the Registry and our progress in addressing metadata challenges can be found in a recent ALA PAN Forum presentation as well as in “Detecting US Federal Documents to Expand Access,” a paper presented during the 2016 IFLA Congress.

Author(s):  Valerie Glenn