Available Indexes

HathiTrust Quality Assurance and Standards Working Group

Completed in July 2019

HathiTrust is committed to providing a high quality digitized corpus for all its stakeholders. With a focus on quality assurance and advancement as an affirmed priority by the membership,  the Program Steering Committee (PSC) established the HathiTrust Quality Assurance and Standards Working Group. This group built on the initial work of the ad hoc Quality Working Group, convened in August 2015, which offered recommendations on development of a quality metadata schema, called for the provision of quality indicators (or “signals”) visible to end users, and developed and shared documentation for Google partners to use in replacement and insertion of pages in Google-scanned volumes. The Quality Assurance and Standards Working Group reported to the PSC, with liaisons to the Collections Committee and the Metadata Policy, Strategy, Use and Sharing Advisory Group (MUSAG). 

Accomplishments 

The group demonstrated deep technical knowledge about the digital content found in the HathiTrust repository, including active and intimate knowledge of Google systems and practices.  The group leveraged complementary knowledge, including metadata expertise, end user support, systems development, digitization data models, etc. Key accomplishments:

  • Development and distribution of a metadata schema or data dictionary that could be used to capture the various quality metadata signals that exist across the HathiTrust metadata ecosystem.   

  • Investigation of the relative (subjective) quality of Google materials that had historically been excluded (gated) from the HathiTrust repository. The group delivered a report presenting its findings to PSC along with the recommendation that any remaining object gating be suspended. HathiTrust staff  implemented QASWG’s recommendation allowed the excluded content into the repository.

Charge (2016)

The HathiTrust Quality Assurance and Standards Working Group is charged to recommend strategies, processes and techniques detailing how members, end users, and/or HathiTrust Operations can contribute to the quality of the corpus through making scalable improvements of digital surrogate fidelity at the item/object level.  This includes, but is not limited to recommending on development of a quality certification program, identifying and reporting errors and omissions, making corrections, and running pilot tests to monitor the quality of ingested content.  Out of scope are collection development-related quality improvements (e.g., collection completeness, underrepresented collection areas, etc.).

Specifically, the Working Group is charged to conduct the following initial and longer-term work:

Initial Work

  • Inventory existing sources of quality metadata (GRIN, METS, OCR scoring, user feedback, locally gathered information, etc.) and recommend an approach, methodology and/or system for the collection, aggregation and storing of relevant quality metadata from available sources;

  • Liaise with MUSAG and the Zephir team, which hold primary purview for bibliographic metadata quality, on assessing the requirements for and development of a metadata schema for quality status or characterizations of items in the corpus, including the recording of quality issues reported by members and end-users;

  • Work with HathiTrust Operations staff to test and monitor the quality level of content that is still gated (i.e., disallowed from ingest into the repository) due to quality issues; and

  • Review HathiTrust standards and specifications related to ingest and quality assurance of the digital objects in the corpus; and recommend any needed changes on the gating practices based on the findings.

Longer-Term Work

  • Liaise with the Collections Committee to identify categories or specific content that should be given priority for quality improvement efforts, and to recommend actions,  including remediation designs and pilot testing, that would advance this work;

  • Evaluate the need for and potential value of a quality certification program.

  • Liaise with MUSAG to propose and, ultimately, develop and test methods of displaying to users identified quality attributes (or “signals,” positive and negative) in specific items and relative to intended purposes (e.g., quality fitness for print disabled users with respect to OCR accuracy, verified completeness, etc.);

  • Develop, document, and monitor procedures for page insertions and inclusion of non-Google and other locally digitized supplementary or replacement content; and

  • Communicate quality improvement initiatives and liaise with the proposed Digital Object Quality Corrections subgroup of the User Support Working Group, which intends to focus on responding to quality problems flagged by users through the HathiTrust User Support mechanism, according to the standards and policies set.

Timeline

The Working Group will conduct its initial work August 2016 – July 2017, consulting with the PSC at which time the PSC will review possible adjustments involving priorities and workplans.

Communications and Guidance

The PSC will appoint 1-2 of its members to serve as attending liaisons to the Working Group. Issues, requests, and informational reporting between the Working and the PSC will be primarily conveyed by these members. The PSC also requests brief periodic progress reports from the Working Group chair (e.g., quarterly), which may prompt interest in occasional invitations for discussions with the PSC at its biweekly meeting.

Membership

  • Paul Fogel, California Digital Library, Chair
  • Aaron Elkiss, University of Michigan
  • Natalie Fulkerson, HathiTrust
  • Janet Gertz, Columbia University
  • Peter Gorman, University of Wisconsin-Madison
  • Kat Hagedorn, University of Michigan
  • Sandra McIntyre, HathiTrust
  • Michelle Paolillo, Cornell University
  • Evviva Weinraub, Northwestern University
  • Angelina Zaytsev, HathiTrust

Selected Background Documents

Archival Quality and Long-term Preservation: A Research Framework for Validating the Usefulness of Digital Surrogates; by Paul Conway; 2011
https://deepblue.lib.umich.edu/handle/2027.42/86643

HathiTrust Program Steering Committee Planning Brief: HathiTrust Quality and Validation Issues; September 2014
https://drive.google.com/a/umn.edu/file/d/0B-kkSzFLihRaX2h3dTh0aTJPWVE/view

HathiTrust Commitment to Quality Statement; June 2016
https://www.hathitrust.org/quality

HathiTrust Quality Assessment and Validation: Report of the Visibility and Feedback Working Group
https://docs.google.com/document/d/1oo71S022-nTowvnmzTuSxPahErVcKcer2s98IvZkCqM/edit

HathiTrust Quality Validation Group: Foldouts and Missing Pages Working Group
https://docs.google.com/document/d/1oDJXOg72426_4CDahnWlOxWb6YpO6gpVwpGqz9NOTHQ/edit

Quality in HathiTrust; blog posting by Jeremy York (HathiTrust) and Kat Hagedorn (University of Michigan Library); May 13, 2015
https://www.hathitrust.org/quality-in-hathitrust