Navigation

U.S. Federal Documents Program Update March 2017

The following is a report on the progress of the HathiTrust U.S. Federal Documents Program, highlighting new developments in the areas of collection development, discovery and access, infrastructure, and outreach. Beginning with this report, Program updates will be provided on a quarterly basis.

Collection Profile Project

In Fall 2016 HathiTrust staff conducted an overall analysis of the current HathiTrust federal documents collection to inform collection development strategies, but also to serve as a feasibility test for determining a variety of metrics based on the data available to us and to establish a baseline for reporting on the collection.  There have been no previous attempts to conduct this type of analysis on a specific HathiTrust collection.  In addition to the overall analysis, we also performed a test to determine comprehensiveness of digitized versions of a few key federal document titles within HathiTrust. Early findings were presented at the HathiTrust Member Meeting. Details can be found in a new blog post  and report on the project.

Another key outcome of the project is a new set of statistics, below, as of March 1, 2017. We will provide an update on the numbers quarterly.

Federal Documents within the HathiTrust Digital Library, As of March 1, 2017

  • 415,076 bibliographic records
  • 981,335 separate digital objects
  • 389,864 monographs
  • 24,870 serial titles

Why new statistics? Hasn’t HathiTrust been reporting a count of federal documents on the website? The numbers reported on our site are drawn from HathiTrust bibliographic data. The collection profile project provided an opportunity to define the set of “federal documents” using both the HathiTrust bibliographic data and the Registry database, leveraging work we’ve done to create definitive records in the Registry.

Registry Assessment

Earlier this year, the Program Officer completed an assessment of the U.S. Federal Documents Registry in order to plan for greater use and further development if needed. The assessment concluded that we should not embark on a new phase of development, but instead should focus on actively using the Registry, and in the process, get a finer understanding of how it might be developed further for HathiTrust’s needs. In the coming months, we will use it as a tool to engage with members for collection-building projects and requests (such as digitization projects and collection comparisons), and will leverage this work to specify further development needs. Support for digitization and collection development are a priority.  To this end, Registry work is now focused on:

  • Determining the ideal Registry interactions in a digitization project’s lifecycle, building on our ability to generate picklists.
  • Continuing definitive identification of documents, creation of Registry records, and winnowing records that don’t belong.
  • Leveraging the Registry to pioneer improvements that will enrich HathiTrust discovery and grow the HathiTrust collection in the future.  For example, use of the Registry to provide name authorities and relationships for federal entities.
  • Specifying improvements needed to better provide collection comparisons, analysis, and data to members, in the course of using the Registry as a tool.
  • Maintaining the Registry web interface while overall HathiTrust interface improvement, shared print, and data provision plans incubate.  Later in the year, we plan to review the Registry interface and HathiTrust UI to make recommendations regarding discovery of and access to federal documents, the intent being to improve and build upon functionality.

Collection-building

The Program Officer has been in action on implementing the program, and, in consultation with the Federal Documents Advisory Committee (FDAC), is tackling some of the key challenges we’ve identified that need to be solved.  Some of the largest relate to collection-building.  In addition to the Program plan, FDAC has been engaged with the Program officer in:

  • Ensuring that federal documents are considered in the Shared Print Program, through a statement to SPAC.
  • Staking out a “collection framework” to delineate collection development priorities beyond opportunistic mass digitization.
  • Strategizing on potential digitization flows vis a vis collection goals
  • Advising on and endorsing communications and outreach goals and messaging

On the ground, the Program Officer has been pursuing digitization and collection-building. Some activities:

  • Continuing to encourage HT members’ digitization of federal documents, and meeting with them to identify opportunities for coordinating existing mass digitization projects
  • Bringing together HathiTrust staff to outline an actionable digitization workflow, similar to the TRAIL project, at a very granular level. We are at a point where we are seeking a member to pilot this
  • Engaging in numerous conversations with Google and Internet Archive, with an eye towards matching their capabilities to HathiTrust’s needs
  • Compiling an inventory of digitization strategies, identifying the most the actionable. Next step will be to seek input from FDAC and other appropriate HathiTrust groups on strategy.

Metadata projects

  • We are conducting a review of the HathiTrust rights determination process as it relates to federal documents, with a goal of looking for ways that we can open up more federal documents perhaps via metadata remediation.
  • HathiTrust staff have been in dialog with the CDL Zephir team to explore ways to improve guidance to partners on correctly identifying federal documents in bibliographic data at the time of ingest.
  • HathiTrust staff continue to improve the Registry data, via identification and reconciliation of enum/chron data, as well as a new project to explore better identification of name authorities and relationships for federal entities, all of which will enable us better identify gaps in our collection and target materials for digitization
  • Registry Analyst Valerie Glenn made significant contributions to the Metadata Use and Sharing Advisory Group (MUSAG)  and to HathiTrust’s understanding of overall metadata flows

Upcoming

High priorities for the coming months include further exploration of new digitization strategies and development of pilot project(s), focused projects to dig deeper into metadata issues in order to more accurately identify collection gaps, and work to find optimal ways to leverage the Registry to share data with HathiTrust members for collection analysis.