Navigation

HathiTrust Metadata Management System

By 2012, the California Digital Library (University of California) will build a major component of HathiTrust: a system to manage HathiTrust partners’ bibliographic metadata.

California Digital Library (CDL) is using HathiTrust funds to hire a metadata expert to support this development.  CDL is making an in-kind contribution of infrastructure and developer resources.

Goals

  • Provide equivalent metadata management functionality to the existing Aleph-based system.
  • Provide improved update, match and merge record management functionality to the HathiTrust.
  • Provide a flexible framework for the management of metadata at many levels (e.g.: work, manifestation, item)
  • Position the HathiTrust to respond to metadata management challenges raised by duplicate and surrogate records.

Executive Project Sponsors and Coordinators

Laine Farley, Executive Director, California Digital Library; HathiTrust Executive Committee member
John Wilkin, Executive Director of the HathiTrust

California Digital Library (University of California) Team

Lynne Cameron, UC HathiTrust Co-Technical Lead
Heather Christenson, UC HathiTrust Project Manager
Stephanie Collett, Technical Project Lead
Paul Fogel, UC HathiTrust Co-Technical Lead
Kathryn Stine, Metadata Analyst & Project Manager
Michael Thwaites, Programmer & Testing Coordinator
Lena Zentall, Project Manager

Timeline

Milestone

Progress
Planning phase April - October 2010Completed
Project officially launches November 2010

Complete business arrangements & funds transfer

Completed

 

Ongoing procedures for receiving input files and pre-ingest transformation procedures in place

Completed

 

Core file system in place

Completed

 

Core database in place

Completed

 

Import (transformation)

  • Routines in place to normalize and transform bibliographic data submitted by current content contributing partners

Completed

 

 

Milestone: Generic core system in place.

Completed; Demo'ed 6/14/11

 

Named the system "Zephir"Completed
(Preliminary) load and test records Completed
Reconcile differences between original contributor records and HathiTrust recordsIn progress 
Confirm ingest standards and workflows for contributing records (minimum submission standard, record correction policies & handling)Completed

Process (rights, daylight, preferred record score):

  • Rights routines developed to record rights determination to support incorporation of the HTMMS into existing HathiTrust Rights workflows.
  • Daylighting routines developed to determine when a HathiTrust object is fully processed and vetted and ready for incorporation into discovery and delivery systems.
  • Preferred record score script developed to heuristically score records and save the results in the core system. The preferred record score means the base record can be identified at the point of export.

Completed

 

Process (batch exports)

  • Routines in place to produce VuFind batch exports from the system

 

Completed

Process (batch exports)

  • Routines in place to produce Hathifiles batch exports from the system.
Completed
System adapted to HathiTrust workflow
Completed
Load records 

Development environment load target: early June 2013

Staging environment load target: early July 2013

Production environment load target: late July 2013

Functional and performance testing of system 
Integration testing 
System acceptance - Run systems in parallel Begin August 2013
Cutover to Zephir - System in production with the HathiTrust

Fall 2013

Monthly Project Updates

December 2010

January 2011

February 2011

March 2011

April 2011

May 2011

June 2011

July 2011

August 2011

September 2011

October 2011

November 2011

December 2011

January 2012

February 2012

March 2012

April 2012

May 2012

June 2012

July 2012

August 2012

September 2012

October 2012

November 2012

December 2012

January 2013

February 2013

March 2013

 

Contact

Stephanie Collett, Technical Lead

Kathryn Stine, Project Manager