Overview of Bibliographic Metadata Submission Process
Bibliographic records for HathiTrust items are managed by the University of California in a system called Zephir. For a full description of the submission process, including error reporting and other details, see the guide Submitting Metadata to Zephir. An overview of the process is given below.
Contributor submits sample records to Zephir via FTPS
- Contributors should contact email@example.com to initiate discussions about ingest, including bibliographic metadata. Through this process, contributors will be given account information (login and password) used to access an FTPS directory for submitting bibliographic data to Zephir. Instructions on using FTPS are in Appendix A of Submitting Metadata to Zephir. Contributors will initially be asked to submit sample records so that the Zephir team can evaluate their potential success in being ingested.
Contributor submits files of records via FTPS
- Once a contributor's sample files have been analyzed and their records have been prepared for submission, full files of bibliographic records can be submitted via FTPS to the Zephir “submissions” directory using this address: ftps.cdlib.org (220.127.116.11). Contributors will need to use their account name and password to access this directory.
Contributors files should:
- include no more than 100,000 records each
- contain only records from a single digitization agent (e.g., Google)
correspond to the following file name convention:
- <metadata source code>_<configuration code>_<date>_<digitizing agent>_<other distinguishing data>.xml
- Example: uc_nrlf-ucsc_20120530_google.xml
- Note: metadata source codes and configuration codes will be provided to contributors, and "other distinguishing data" is optional.
- When contributors submit bibliographic records via FTPS, an email should also be sent to firstname.lastname@example.org that includes the following transmission information, formatted in the message body as follows for machine readability:
file name=<file name>
file size=<file size in bytes>
record count=<number of records>
notification email=<email address to which you would like your run notification sent>
When bibliographic records are loaded into Zephir, they are given a score based on the presence or absence of data in MARC metadata fields. When more than one institution deposits a record for an item, the record score determines which record is used in the HathiTrust catalog. See Zephir Record Scoring for details.
After files have been submitted and loaded into Zephir, contributors will receive an email notification with information about the loading run: a brief run report and histogram detailing MARC tag usage in submitted records (typically sent 1-2 days after files have been received). Additionally, contributors can retrieve more detailed run reports specific to each file run via FTPS. More information about what contributors can expect to find in email notifications, run reports, and histograms can be found in Submitting Metadata to Zephir.
Bibliographic Record Corrections
Contributors may be requested to correct or enhance records in response to errors or deficits identified during loading or observed after records have been included in the system.
It is the general policy of the HathiTrust not to alter the content of contributor's records, except where necessary to assure the coordination of functions in the metadata management system (http://www.hathitrust.org/bib_metadata_correction).
Detailed information about how errors (both loading and user-observed) are reported back to contributors and how contributors can address them is included in Submitting Metadata to Zephir. Corrected records can be re-submitted following the process outlined in Step #2 above.