Navigation

Codes Used Internally in Ingest Processes

The following information can be used by content contributors to understand what different codes mean, how they are displayed and included in data outputs, and the various impacts of changing codes.

Namespace

Contributors set and provide the identifiers for their content. Once in HathiTrust, a namespace is appended to the identifier to form the HathiTrust identifier. A period (.) separates the namespace and the original identifier. 

  • How it’s set: A contributor selects a namespace when HathiTrust staff state that a new namespace is necessary. The filename for a submission inventory package (SIP) is the identifier. The identifier is also included in the associated bibliographic metadata record for that SIP. During ingest, content and bibliographic metadata loading scripts add the namespace to the  contributor identifier.
  • How it’s displayed to end users: The HathiTrust identifier is displayed in the address bar when a user is looking at a book. It is also listed in the left sidebar under “Share.”  
  • See also:

Example:
A permanent url for HathiTrust content is in the form <https://hdl.handle.net/2027/ucbk.ark:/28722/h2t88c>. The HathiTrust identifier in this example is:

ucbk.ark:/28722/h2t88c

The namespace in this HathiTrust identifier is: ucbk

Digitization agent code

The digitization agent code indicates the organization that digitized the item. Some digitization agents have access restrictions.

  • How it’s set: During setup of a new content stream, HathiTrust staff will determine if a new digitization agent code is necessary. The preferred pattern for selecting new codes is to use the website domain for an organization. After a new digitization agent code is set, HathiTrust staff incorporate it in content and metadata loading scripts. Contributors control the  digitization agent for a specific item by including the correct code in bibliographic record filenames submitted to Zephir. Contributors with multiple content streams should make sure that the correct digitization agent code is used in filenames.
  • How it’s displayed to end users: The digitization agent is displayed in the user interface as a watermark on page images in the format “Digitized by XXXX.” Items that are born-digital will not have a watermark.
  • Data outputs: The digitization agent code is output in the HathiTrust MARC records in the field 974 |s. It is output in the hathifiles.
  • See also: Potential code values are described at https://www.hathitrust.org/institution_identifiers

Example:

For the item https://hdl.handle.net/2027/mdp.39015067033145 the watermark “Digitized by Google” is displayed on the page images.

The associated MARC record at https://catalog.hathitrust.org/Record/000057148.marc contains the following field:

974 ⊔ ⊔ ‡bMIU ‡cMIU ‡d20170608 ‡sgoogle ‡umdp.39015067033145 ‡z1981 ‡y1981 ‡rcc-by-sa-4.0 ‡qcon

The subfield ‡s contains the value “google” which is the digitization agent code for this item.

Content Provider Code

The content provider code is the organization that owned or held the original item before it was digitized and/or submitted to HathiTrust. Contributors that have deposited content on behalf of another organization can capture multiple values for this code to record a chain of provenance. Only one value can be displayed to users.

  • How it’s set: During setup of a new content stream, HathiTrust staff will determine if a new content provider code is necessary. The preferred pattern for selecting new codes is to use the website domain for an organization. After a new content provider code is set, it is mapped to a collection code (see below) as well. HathiTrust staff incorporate the content provider code in content loading scripts. Contributors control the content provider code for a specific item by including the correct configuration code (see below) in bibliographic record filenames submitted to Zephir.
  • How it’s displayed to end users: The content provider is displayed in the user interface as a watermark on page images in the format “Original from XXXX.” It is displayed in a catalog record in the format “Original from XXXX.” Users can also facet by the content provider in catalog and full-text searches using the “Original Location” facet.
  • Data outputs: It is output in the hathifiles.
  • See also: Potential code values are described at https://www.hathitrust.org/institution_identifiers

Example:

For the item https://hdl.handle.net/2027/mdp.39015067033145 the watermark “Original from University of Michigan” is displayed on the page images.

Responsible Entity

The content provider code is the organization that took responsibility for the deposit of the item into HathiTrust and has ongoing custodial responsibility for an item. In most cases this is identical to the Content Provider.

  • How it’s set: During setup of a new content stream, HathiTrust staff will determine if a new responsible entity code is necessary. The preferred pattern for selecting new codes is to use the website domain for an organization. After a new responsible entity code is set, it is mapped to a collection code (see below) as well. HathiTrust staff incorporate the responsible entity code in content loading scripts. Contributors control the responsible entity for a specific item by including the correct configuration code (see below) in bibliographic record filenames submitted to Zephir.
  • How it’s displayed to end users: The responsible entity is not displayed to users in the interface. It is an administrative data element.
  • Data outputs: The responsible entity code is output in the hathifiles.
  • See also: Potential code values are described at https://www.hathitrust.org/institution_identifiers

 

Collection Code

The purpose of the collection code is primarily to share information between Zephir and the HathiTrust Repository. The collection code maps to a unique combination of content provider, responsible entity and licensing terms. Multiple configuration codes may map to a collection code.

  • How it’s set: During setup of a new content stream, HathiTrust staff will determine if a new collection code is necessary. The preferred pattern for selecting new collection codes is to use the MARC organization code. HathiTrust staff incorporate the collection code in content and metadata loading scripts. Contributors control the responsible entity and content provider codes for a specific item by including the correct configuration code in bibliographic record filenames submitted to Zephir. That configuration code maps to the appropriate content provider and responsible entity codes. Contributors with multiple content streams should make sure that the correct configuration code is used in filenames.
  • How it’s displayed to end users: The collection code is not displayed to users.
  • Data outputs: The collection code is output in the HathiTrust MARC records in the field 974 |c. It is output in the hathifiles.

Example:

For the item https://hdl.handle.net/2027/mdp.39015067033145 the associated MARC record at https://catalog.hathitrust.org/Record/000057148.marc contains the following field:

974 ⊔ ⊔ ‡bMIU ‡cMIU ‡d20170608 ‡sgoogle ‡umdp.39015067033145 ‡z1981 ‡y1981 ‡rcc-by-sa-4.0 ‡qcon

The subfield ‡c contains the value "MIU" which is the collection code for this item.

Configuration code

The configuration code describes a specific configuration that has been created in Zephir for a contributor’s records. A new configuration and corresponding code are needed when there are significant changes to a contributor’s records (e.g., a contributor has undertaken a system migration, the OCLC number is recorded in a new field, etc.) or when there is a new content provider code or responsible entity code. Configuration codes map to collection codes.

  • How it’s set: Zephir staff will determine when a new configuration is needed. Contributors indicate which configuration their files should be loaded into by adding the appropriate configuration code to their bibliographic record filenames. The configuration code is only available in Zephir and is not transmitted downstream to the HathiTrust Repository.
  • How it’s displayed to end users: The configuration code is not displayed to end users.
  • Data outputs: The configuration code is not included in any data outputs.
  • See also: Code values are described in the document “Submitting Metadata to Zephir”