Navigation

Registry Update May 2015-October 2015

The highlight of US Federal Government Documents Registry activities in the past six months was the June launch of the Registry. The alpha release is currently available at https://www.hathitrust.org/usdocs_registry.

There are currently 6,258,658 records in the Registry, contributed by more than 50 libraries. A complete list of contributors can be found on the About the Registry page.

Much of the project team’s work in May and early June was spent preparing the Registry for the alpha release. After the launch, focus turned to incorporating all usable contributed records, refining duplicate detection, improving search and display, and identifying records for out-of-scope materials.

Relationship detection

Currently, duplicate and related records are clustered together, with a representative record chosen for display. The matching is done based on identifiers such as OCLC number, ISSN, ISBN, and Superintendent of Documents call number. The matching has been refined in the past couple of months, as identifiers are now more strictly normalized. Project staff continue to work on improving duplicate detection by exploring ways to parse item description information (enumeration and chronology), and identifying records for out-of-scope materials such as state and foreign government documents.

Record Comparison

The two University of Washington iSchool students completed their work in early June. Their work provided valuable feedback for us and raised important questions about the relationships we are interested in displaying, as well as the presentation of data for comparison. Project staff plan to revisit their findings once the duplicate detection algorithm has been stabilized.

Search and Display

Project staff worked to improve the search and display based on user feedback. More fields are now displayed without having to view the MARC record, and several fields are linked, allowing users to identify related publishers and authors. The OCLC number takes users to the WorldCat record. A JSON record view was also added, and the MARC record display lists all of the records clustered together, not just the representative record chosen for public display.

Additional search options, including OCLC number and limit to items currently in the HathiTrust Digital Library, are now available. Initial work has been done on an advanced search, which will allow users to search on multiple fields at once.

Gap Detection/Comprehensiveness

Automated gap detection and comprehensiveness analysis has remained a challenge for several reasons, most notably due to the quality of the data, the varying nature of item description, or enumeration and chronology, as well as the presence of records for out-of-scope materials.

Staff have begun drafting a candidate list of proposed titles and government agencies to test for comprehensiveness, which will be shared with the HathiTrust Executive Director and the Federal Government Documents Advisory Group in the near future.

Outreach

Project staff participated in discussions with several interested parties, including:

  • Committee on Institutional Cooperation Heads of Government Publications
  • American Library Association Government Documents Round Table Cataloging Committee
  • Representatives from TRAIL (Technical Reports Archive and Image Library)
  • Steering Committee of the Association of Southeastern Research Libraries’ Collaborative Federal Depository Program

Project staff also provided a webinar for the HathiTrust Federal Government Documents Advisory Group and early testers, to walk them through the Registry, explain some current limitations, and ask for feedback/other potential use cases.

Future plans

We anticipate moving the Registry to beta in early 2016 - this will include a single Registry record with a unique identifier; a more accessible user interface; additional record export capabilities; and closer to real-time updates from the HathiTrust Digital Library. We also plan to move forward with analysis, identifying those items which have records in the Registry but are not present in the HathiTrust Digital Library.

Please contact valglenn@umich.edu with any comments or questions.