HathiTrust Data API

Contents

Introduction (DRAFT - Rev. 0.7)

This document describes a RESTful API to provide access to HathiTrust repository data and metadata resources. The HathiTrust Repository Data (HTD) API is referred to simply as API in this document.

NOTE: This document is in draft status and is presented for comment.

Quick Overview

The HTD API provides extensible, efficient and secure access to the data and metadata resources of the HathiTrust Repository. The design intent is to support client applications that already have an item identifier and simply need the corresponding data (or metadata). It should make services and uses possible beyond those available through current applications. Examples of current applications are the HathiTrust Collection Builder and Pageturner.

Applications that need a large number of metadata records or data sets should request them iteratively.

The repository resources consist of digitized print or born-digital volumes composed of page images and OCR text and corresponding structural and administrative metadata. Other APIs provide sources of identifiers and bibliographic metadata. Examples include:

The API accepts a request for a resource and returns XML, JSON or binary representations of the resource. The available representations depend on the resource in question.

The resources served by the API are partitioned into classes that have varying access policies.

  • meta, pagemeta and structure - Accessible without restriction. No authentication or authorization required.
  • aggregate, pageimage and pageocr - Access varies:
    • Open to download without restriction, OR
    • Available to download with some restrictions on rate of access but otherwise open, OR
    • Download requires client authentication to and authorization by the API.

Search and browse are seen as separate services available through a different API than described here.

URI scheme

Concrete examples are provided in each section describing a resource below. Square braces indicate an optional parameter. Throughout, variables are UPPERCASE prefixed with a colon, e.g. :VAR. XPath notation for elements and attributes appears occasionally.

http[s]://services.hathitrust.org/api/htd/:RESOURCE/:ID[/:SEQ]
 ?
[v=:N]
[&alt=json[&callback=:CALLBACK]]

 

Access to restricted resources is over SSL using https:// protocol.

The values for the :RESOURCE variable for version 1 are:

  • meta
  • structure
  • aggregate
  • pageimage
  • pageocr
  • pagemeta


The :ID variable ranges over the all namespace-qualified barcodes or other logical identifiers for repository objects. Examples of namespaces are mdp, miun, wu.

The :SEQ variable is an integer starting at 1 and ranges up to the number of page images in the object.

The version query parameter (v) supports backward compatibility as the API evolves. Clients can explicitly specify the version of the requests they are issuing and of the response that they want. By default, when no version parameter is supplied, responses are in the format of the latest version of the API.

Where application/xml responses apply, alt=json requests the response in JSON format. In that case, an optional javascript callback function name can be supplied. The default response format is application/xml .

HTD Extension Elements, Attributes and Schema

The schema for the XML responses is based on the Atom Syndication Format in the spirit of the response schema for a volume from the Google Book Search Data API as shown in the Data API: Reference Guide.

XML responses are formatted as atom:entry elements in the default atom namespace. The required atom:id, atom:title, atom:updated elements are present. The HTD API schema extends the Atom schema by defining and using the htd namespace.

Note that the use of the atom:entry element is adopted in the context of access to data and not necessarily of access to a feed.

The HTD API is a data API with accompanying structural and administrative metadata. It is not a bibliographic metadata API. The atom:title element contains text that describes the entry and is not the title of the book. For example,

HathiTrust Repository Data API - single page metadata. 

The schema employs a URI-based scheme for additional values of the atom:link[@rel] attribute. For resource identifiers we have:

  • http://schemas.hathitrust.org/htd/2009#meta
  • http://schemas.hathitrust.org/htd/2009#pagemeta
  • http://schemas.hathitrust.org/htd/2009#structure
  • http://schemas.hathitrust.org/htd/2009#aggregate
  • http://schemas.hathitrust.org/htd/2009#pageimage
  • http://schemas.hathitrust.org/htd/2009#pageocr

The optional element atom:link appears with the rel=alternate and rel=self attributes.

  • link[@rel='alternate] - Generally taken to mean the permalink to the content pointed to by the entry. Currently this includes a link to the HathiTrust pageturner which is quasi-permanent and a link to the Handle Server. For example,
http://babel.hathitrust.org/cgi/pt?id=:ID[&seq=:SEQ]

and

http://hdl.handle.net/2027/:ID
  • link[@rel='self'] - This is the preferred URI for retrieving the entry itself. This value is important in scenarios where only the entry is available and not the location from which the entry was retrieved.

Extension Elements

Extension elements are in the htd namespace and vary with response. Refer to example abstract responses below.

  • htd:version - the version number of the API generating the response
  • htd:selected_seq - the page sequence number requested. (pagemeta resource only.)
  • htd:numpages - the number of pages in the volume
  • htd:access[@resource] - asserts whether downloading the page images, OCR and zipped data is restricted to authenticated and authorized client applications or open but rate-limited or freely available. Metadata access does not require authentication and authorization. Restricted or limited access does not imply restricted viewability. Restricted data may be freely viewable or viewable only under certain conditions, e.g. based on geographical location of the client or the brittle status of the book or the authenticated identity of the user. This means the client application must make the final viewability determination and is bound by the agreements that permit downloading restricted items. Attribute values:
    • http://schemas.hathitrust.org/htd/2009#open
    • http://schemas.hathitrust.org/htd/2009#limited
    • http://schemas.hathitrust.org/htd/2009#restricted
  • htd:rights - container element for rights metadata:
  • htd:pgmap - container element for page number to page sequence number map
    • htd:pg[@pgnum] - the mapping element. attribute is page number, content is page sequence number. one for each page number.
  • htd:seqmap - container element for map of page sequence number to page number, feature, format.
    • htd:seq[@pseq] - attribute is the sequence number of the page, content is the page number
    • htd:pnum - the page number either printed or implicit (if available)
    • htd:imgfmt - format of the page image: tiff or jp2 or jpg
    • htd:pfeat - the page feature key (if available):
      • CHAPTER_START
      • COPYRIGHT
      • FIRST_CONTENT_CHAPTER_START
      • FRONT_COVER
      • INDEX
      • REFERENCES
      • TABLE_OF_CONTENTS
      • TITLE

Schema

Refer to Example Abstract Responses in each Resource section for more information.

Resources and Representations

The API provides access to the following resources. The MIME types of the available representations are shown in the table below. An example URI is provided. An example abstract response is shown for resources with application/xml representations.

Resource Representation(s)/MIME type(s)
Volume and Rights Metadata (meta) application/atom+xml & application/json
METS (structure) application/xml & application/json
zip file (aggregate) application/zip
Single Page Metadata (pagemeta) application/atom+xml & application/json
Single Page Image (pageimage) image/jp2 | image/tiff | image/jpg
Single Page OCR (pageocr) text/plain

 

Volume and Rights Metadata

This resource consists of:

  • API version number
  • access values
  • count of page image / OCR text pairs
  • a row of the rights database consisting of the data from following fields: id, namespace, attr, reason, source, user, time, note as described in the database layout document
  • a map of page sequence number to:
    • page number, either explicitly on the printed page or algorithmically derived during digitization
    • page feature tags as defined by the label attribute of the METS:structMap/METS:div element of the Structure (METS document) resource. See Extension Elements and METS schema.
    • page image file format, one of tiff or jp2 or jpg
  • a map of page number to page sequence number

Note: Page feature and page number metadata is not available for some instances of this resource.

Compare with Single Page Metadata

Example URI

Resource request for the volume and rights metadata for a public domain, Google-digitized book in response format of application/atom+xml

http://services.hathitrust.org/api/htd/meta/mdp.39015070515765

 

Example Response

Edited for brevity.

<entry 
xmlns="http://www.w3.org/2005/Atom"
xmlns:htd="http://schemas.hathitrust.org/htd/2009">
<link
rel="alternate"
href="http://hdl.handle.net/2027/mdp.39015070515765"
type="text/html"/>
<link
rel="self"
href="http://services.hathitrust.org/api/htd/meta/mdp.39015070515765"
type="application/atom+xml"/>
<link
rel="http://schemas.hathitrust.org/htd/2009#aggregate"
href="https://services.hathitrust.org/api/htd/aggregate/mdp.39015070515765"
type="application/zip"/>
<link
rel="http://schemas.hathitrust.org/htd/2009#structure"
href="http://services.hathitrust.org/api/htd/structure/mdp.39015070515765"
type="application/xml"/>
<htd:version>1</htd:version>
<htd:numpages>668</htd:numpages>
<htd:seqmap>
<htd:seq pseq="1">
<htd:pnum></htd:pnum>
<htd:pfeat>FRONT_COVER</htd:pfeat>
<htd:pfeat>IMAGE_ON_PAGE</htd:pfeat>
<htd:pfeat>UNTYPICAL_PAGE</htd:pfeat>
<htd:pfeat>IMPLICIT_PAGE_NUMBER</htd:pfeat>
<htd:imgfmt>image/jp2</htd:imgfmt>
</htd:seq>
<htd:seq pseq="2">
<htd:pnum>i</htd:pnum>
<htd:pfeat>UNTYPICAL_PAGE</htd:pfeat>
<htd:pfeat>IMPLICIT_PAGE_NUMBER</htd:pfeat>
<htd:imgfmt>image/jp2</htd:imgfmt>
</htd:seq>
<htd:seq pseq="3">
<htd:pnum>ii</htd:pnum>
<htd:pfeat>IMPLICIT_PAGE_NUMBER</htd:pfeat>
<htd:imgfmt>image/jp2</htd:imgfmt>
</htd:seq>
</htd:seqmap>
<htd:pgmap>
<htd:pg pgnum="i">2</htd:pg>
<htd:pg pgnum="ii">3</htd:pg>
</htd:pgmap>
<id>http://services.hathitrust.org/api/htd/meta/mdp.39015070515765</id>
<title>HathiTrust Repository Data API - metadata</title>
<updated>2009-03-11T17:29:58.602-0400</updated>
<htd:access resource="pageocr">
http://schemas.hathitrust.org/htd/2009#limited</htd:access>
<htd:access resource="pageimage">
http://schemas.hathitrust.org/htd/2009#limited</htd:access>
<htd:access resource="aggregate">
http://schemas.hathitrust.org/htd/2009#restricted</htd:access>
<htd:rights>
<htd:note/>
<htd:user>jhovater</htd:user>
<htd:time>2008-07-09T00:30:11</htd:time>
<htd:namespace>mdp</htd:namespace>
<htd:source>1</htd:source>
<htd:attr>1</htd:attr>
<htd:id>39015070515765</htd:id>
<htd:reason>1</htd:reason>
</htd:rights>
</entry>

 

RELAX NG Schema - Compact

default namespace = "http://www.w3.org/2005/Atom"
namespace htd = "http://schemas.hathitrust.org/htd/2009"
start =
element entry {
element link {
attribute href { xsd:anyURI },
attribute rel { xsd:anyURI },
attribute type { text }
}+,
element htd:version { xsd:integer },
element htd:numpages { xsd:integer },
element htd:seqmap {
element htd:seq {
attribute pseq { xsd:integer },
element htd:pnum { text },
element htd:pfeat { xsd:NCName }+,
element htd:imgfmt { text }
}+
},
element htd:pgmap {
element htd:pg {
attribute pgnum { xsd:NCName },
xsd:integer
}+
},
element id { xsd:anyURI },
element title { text },
element updated { xsd:NMTOKEN },
element htd:access {
attribute resource { xsd:NCName },
xsd:anyURI
}+,
element htd:rights {
element htd:note { empty },
element htd:user { xsd:NCName },
element htd:time { xsd:NMTOKEN },
element htd:namespace { xsd:NCName },
element htd:source { xsd:integer },
element htd:attr { xsd:integer },
element htd:id { xsd:integer },
element htd:reason { xsd:integer }
}
}

 

Single Page Metadata

This resource consists of a partial reiteration of the volume and rights metadata for the book together with the page feature metadata available for the given sequential page.

Example URL

Resource request for the metadata for the 11th sequential page of an in-copyright, Google-digitized book in response format of application/json. Compare with Volume and Rights Metadata.

https://services.hathitrust.org/api/htd/pagemeta/mdp.39015005102796/11?alt=json

 

Example Response

{
"xmlns": "http://www.w3.org/2005/Atom",
"xmlns:htd": "http://schemas.hathitrust.org/htd/2009",
"id": "http://services.hathitrust.org/api/htd/pagemeta/mdp.39015005102796/11",
"title":"HathiTrust Repository Data API - single page metadata",
"updated": "2009-03-12T09:48:43.885-0400",
"link": [
{
"rel": "alternate",
"href": "http://hdl.handle.net/2027/mdp.39015005102796",
"type": "text/html"
},
{
"rel": "self",
"href": "http://services.hathitrust.org/api/htd/pagemeta/mdp.39015005102796/11",
"type": "application/atom+xml"
},
{
"rel": "http://schemas.hathitrust.org/htd/2009#pageimage",
"href": "https://services.hathitrust.org/api/htd/pageimage/mdp.39015005102796/11",
"type": "image/tiff"
},
{
"rel": "http://schemas.hathitrust.org/htd/2009#pageocr",
"href": "https://services.hathitrust.org/api/htd/pageocr/mdp.39015005102796/11",
"type": "text/plain"
},
{
"rel": "http://schemas.hathitrust.org/htd/2009#aggregate",
"href": "https://services.hathitrust.org/api/htd/aggregate/mdp.39015005102796",
"type": "application/zip"
},
{
"rel": "http://schemas.hathitrust.org/htd/2009#structure",
"href": "http://services.hathitrust.org/api/htd/structure/mdp.39015005102796",
"type": "application/xml"
},
{
"rel": "http://schemas.hathitrust.org/htd/2009#meta",
"href": "http://services.hathitrust.org/api/htd/meta/mdp.39015005102796",
"type": "application/atom+xml"
}
],
"htd:version": "1",
"htd:selected_seq": "11",
"htd:numpages": "262",
"htd:seqmap": [
{
"htd:seq": {
"htd:imgfmt": "image/tiff",
"htd:pnum": "7",
"htd:pfeat": [
"FIRST_CONTENT_CHAPTER_START",
"IMPLICIT_PAGE_NUMBER"
],
"pseq": "11"
}
}
],
"htd:access": [
{
"content": "http://schemas.hathitrust.org/htd/2009#restricted",
"resource": "pageimage"
},
{
"content": "http://schemas.hathitrust.org/htd/2009#restricted",
"resource": "pageocr"
},
{
"content": "http://schemas.hathitrust.org/htd/2009#restricted",
"resource": "aggregate"
}
],
"htd:rights": {
"htd:user": "jhovater",
"htd:note": {},
"htd:time": "2008-08-14T22:30:23",
"htd:namespace": "mdp",
"htd:source": "1",
"htd:attr": "2",
"htd:id": "39015005102796",
"htd:reason": "1"
},
"htd:pgmap": [
{
"htd:pg": {
"content": "11",
"pgnum": "7"
}
}
],
}

 

Relax NG Schema - Compact

default namespace = "http://www.w3.org/2005/Atom"
namespace htd = "http://schemas.hathitrust.org/htd/2009"
start =
element entry {
element link {
attribute href { xsd:anyURI },
attribute rel { xsd:anyURI },
attribute type { text }
}+,
element htd:version { xsd:integer },
element htd:selected_seq { xsd:integer },
element htd:numpages { xsd:integer },
element htd:seqmap {
element htd:seq {
attribute pseq { xsd:integer },
element htd:pnum { xsd:NCName },
element htd:pfeat { xsd:NCName }+,
element htd:imgfmt { text }
}
},
element htd:pgmap {
element htd:pg {
attribute pgnum { xsd:NCName },
xsd:integer
}
},
element id { xsd:anyURI },
element title { text },
element updated { xsd:NMTOKEN },
element htd:access {
attribute resource { xsd:NCName },
xsd:anyURI
}+,
element htd:rights {
element htd:note { empty },
element htd:user { xsd:NCName },
element htd:time { xsd:NMTOKEN },
element htd:namespace { xsd:NCName },
element htd:source { xsd:integer },
element htd:attr { xsd:integer },
element htd:id { xsd:integer },
element htd:reason { xsd:integer }
}
}

 

Structure

This resource is a currently a METS document representing the volume. The application/xml representation for the METS portion of this resource is described by the METS schema. This resource gives the client application the most detailed picture of the aggregate repository object.

Example URI

Resource request for the METS document for a public domain (US), Google-digitized book in response format of application/xml where the version of the API is explicitly requested:

http://services.hathitrust.org/api/htd/structure/mdp.39015064570875?v=1

 

Example Response

Edited for brevity.

<entry 
xmlns="http://www.w3.org/2005/Atom"
xmlns:htd="http://schemas.hathitrust.org/htd/2009">
<link
rel="alternate"
href="http://hdl.handle.net/2027/mdp.39015064570875"
type="text/html"/>
<link
rel="self"
href="http://services.hathitrust.org/api/htd/structure/mdp.39015064570875"
type="application/xml"/>
<link
rel="http://schemas.hathitrust.org/htd/2009#aggregate"
href="https://services.hathitrust.org/api/htd/aggregate/mdp.39015064570875"
type="application/zip"/>
<link
rel="http://schemas.hathitrust.org/htd/2009#meta"
href="http://services.hathitrust.org/api/htd/meta/mdp.39015064570875"
type="application/xml"/>
<htd:version>1</htd:version>
<id>http://services.hathitrust.org/api/htd/structure/mdp.39015064570875</id>
<title>HathiTrust Repository Data API - METS</title>
<updated>2009-03-12T13:33:59.933-0400</updated>
<htd:access resource="pageocr">http://schemas.hathitrust.org/htd/2009#limited
</htd:access>
<htd:access resource="pageimage">http://schemas.hathitrust.org/htd/2009#limited
</htd:access>
<htd:access resource="aggregate">http://schemas.hathitrust.org/htd/2009#restricted
</htd:access>
<htd:rights>
<htd:note/>
<htd:user>jhovater</htd:user>
<htd:time>2007-09-10T09:30:04</htd:time>
<htd:namespace>mdp</htd:namespace>
<htd:source>1</htd:source>
<htd:attr>9</htd:attr>
<htd:id>39015064570875</htd:id>
<htd:reason>1</htd:reason>
</htd:rights>
<METS:mets
xmlns:METS="http://www.loc.gov/METS/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:PREMIS="http://www.loc.gov/standards/premis"
xsi:schemaLocation="http://www.loc.gov/METS/
http://www.loc.gov/standards/mets/mets.xsd
http://purl.org/dc/elements/1.1/"
OBJID="mdp.39015064570875"
xml:base="/sdr1/obj/mdp/pairtree_root/39/01/50/64/57/08/75/39015064570875/39015064570875.mets.xml">
<METS:metsHdr ID="mdp.39015064570875" CREATEDATE="2008-06-05T16:06:23" RECORDSTATUS="NEW">
<METS:agent ROLE="CREATOR" TYPE="ORGANIZATION">
<METS:name>DLPS</METS:name>
</METS:agent>
</METS:metsHdr>
<METS:dmdSec ID="DMD1">
<METS:mdRef
MDTYPE="MARC"
LOCTYPE="OTHER"
OTHERLOCTYPE="Item ID stored as second call number in item record"
XPTR="mdp.39015064570875"/>
</METS:dmdSec>
<METS:amdSec>
<METS:techMD ID="TMD1">
<METS:mdRef
LOCTYPE="OTHER"
OTHERLOCTYPE="SYSTEM"
MDTYPE="OTHER"
OTHERMDTYPE="text"
LABEL="production notes"
xlink:href="notes.txt"/>
</METS:techMD>
<METS:techMD ID="TMD2">
<METS:mdRef
LOCTYPE="OTHER"
OTHERLOCTYPE="SYSTEM"
MDTYPE="OTHER"
OTHERMDTYPE="text"
LABEL="page metadata"
xlink:href="pagedata.txt"/>
</METS:techMD>
<METS:techMD ID="premisobject1">
<METS:mdWrap MDTYPE="PREMIS">
<METS:xmlData>
<PREMIS:object>
<PREMIS:preservationLevel>1</PREMIS:preservationLevel>
</PREMIS:object>
</METS:xmlData>
</METS:mdWrap>
</METS:techMD>
<METS:digiprovMD ID="premisevent1">
<METS:mdWrap MDTYPE="PREMIS">
<METS:xmlData>
<PREMIS:event>
<PREMIS:eventIdentifier>
<PREMIS:eventIdentifierValue>capture1</PREMIS:eventIdentifierValue>
</PREMIS:eventIdentifier>
<PREMIS:eventType>capture</PREMIS:eventType>
<PREMIS:eventDateTime>2007-06-18T00:00:00</PREMIS:eventDateTime>
<PREMIS:linkingAgentIdentifier>
<PREMIS:linkingAgentIdentifierType>AgentID</PREMIS:linkingAgentIdentifierType>
<PREMIS:linkingAgentIdentifierValue>Google, Inc.</PREMIS:linkingAgentIdentifierValue>
</PREMIS:linkingAgentIdentifier>
</PREMIS:event>
<PREMIS:event>
<PREMIS:eventIdentifier>
<PREMIS:eventIdentifierValue>compression1</PREMIS:eventIdentifierValue>
</PREMIS:eventIdentifier>
<PREMIS:eventType>compression</PREMIS:eventType>
<PREMIS:eventDateTime>2007-09-05T10:09:00</PREMIS:eventDateTime>
<PREMIS:linkingAgentIdentifier>
<PREMIS:linkingAgentIdentifierType>AgentID</PREMIS:linkingAgentIdentifierType>
<PREMIS:linkingAgentIdentifierValue>Google, Inc.</PREMIS:linkingAgentIdentifierValue>
</PREMIS:linkingAgentIdentifier>
</PREMIS:event>
<PREMIS:event>
<PREMIS:eventIdentifier>
<PREMIS:eventIdentifierValue>decryption1</PREMIS:eventIdentifierValue>
</PREMIS:eventIdentifier>
<PREMIS:eventType>decryption</PREMIS:eventType>
<PREMIS:eventDateTime>2007-09-08T02:57:45</PREMIS:eventDateTime>
<PREMIS:linkingAgentIdentifier>
<PREMIS:linkingAgentIdentifierType>AgentID</PREMIS:linkingAgentIdentifierType>
<PREMIS:linkingAgentIdentifierValue>UM</PREMIS:linkingAgentIdentifierValue>
</PREMIS:linkingAgentIdentifier>
</PREMIS:event>
<PREMIS:event>
<PREMIS:eventIdentifier>
<PREMIS:eventIdentifierValue>fixity check1</PREMIS:eventIdentifierValue>
</PREMIS:eventIdentifier>
<PREMIS:eventType>fixity check</PREMIS:eventType>
<PREMIS:eventDateTime>2007-09-08T02:57:45</PREMIS:eventDateTime>
<PREMIS:eventOutcomeInformation>
<PREMIS:eventOutcomeDetail>pass</PREMIS:eventOutcomeDetail>
</PREMIS:eventOutcomeInformation>
<PREMIS:linkingAgentIdentifier>
<PREMIS:linkingAgentIdentifierType>AgentID</PREMIS:linkingAgentIdentifierType>
<PREMIS:linkingAgentIdentifierValue>UM</PREMIS:linkingAgentIdentifierValue>
</PREMIS:linkingAgentIdentifier>
</PREMIS:event>
<PREMIS:event>
<PREMIS:eventIdentifier>
<PREMIS:eventIdentifierValue>ingestion1</PREMIS:eventIdentifierValue>
</PREMIS:eventIdentifier>
<PREMIS:eventType>ingestion</PREMIS:eventType>
<PREMIS:eventDateTime>2007-09-08T02:57:45</PREMIS:eventDateTime>
<PREMIS:linkingAgentIdentifier>
<PREMIS:linkingAgentIdentifierType>AgentID</PREMIS:linkingAgentIdentifierType>
<PREMIS:linkingAgentIdentifierValue>UM</PREMIS:linkingAgentIdentifierValue>
</PREMIS:linkingAgentIdentifier>
</PREMIS:event>
<PREMIS:event>
<PREMIS:eventIdentifier>
<PREMIS:eventIdentifierValue>message digest calculation1</PREMIS:eventIdentifierValue>
</PREMIS:eventIdentifier>
<PREMIS:eventType>message digest calculation</PREMIS:eventType>
<PREMIS:eventDateTime>2007-09-05T10:09:00</PREMIS:eventDateTime>
<PREMIS:eventDetail>jhove1_1e</PREMIS:eventDetail>
<PREMIS:linkingAgentIdentifier>
<PREMIS:linkingAgentIdentifierType>AgentID</PREMIS:linkingAgentIdentifierType>
<PREMIS:linkingAgentIdentifierValue>Google, Inc.</PREMIS:linkingAgentIdentifierValue>
</PREMIS:linkingAgentIdentifier>
</PREMIS:event>
<PREMIS:event>
<PREMIS:eventIdentifier>
<PREMIS:eventIdentifierValue>validation1</PREMIS:eventIdentifierValue>
</PREMIS:eventIdentifier>
<PREMIS:eventType>validation</PREMIS:eventType>
<PREMIS:eventDateTime>2007-09-08T02:57:45</PREMIS:eventDateTime>
<PREMIS:linkingAgentIdentifier>
<PREMIS:linkingAgentIdentifierType>AgentID</PREMIS:linkingAgentIdentifierType>
<PREMIS:linkingAgentIdentifierValue>UM</PREMIS:linkingAgentIdentifierValue>
</PREMIS:linkingAgentIdentifier>
</PREMIS:event>
</METS:xmlData>
</METS:mdWrap>
</METS:digiprovMD>
</METS:amdSec>
<METS:fileSec>
<METS:fileGrp ID="FG1" USE="zip archive">
<METS:file ID="ZIP00000001" MIMETYPE="application/zip" SEQ="00000001" CREATED="2008-06-05T16:06:23" SIZE="7065774" CHECKSUM="b26aff19a616a83eefd0ffcb43be10b0" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="39015064570875.zip"/>
</METS:file>
</METS:fileGrp>
<METS:fileGrp ID="FG2" USE="image">
<METS:file ID="IMG00000001" MIMETYPE="image/tiff" SEQ="00000001" CREATED="2007-09-05T13:07:51" SIZE="1498" CHECKSUM="ad1491cf5c381b7752e5b53f3b50621c" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="00000001.tif"/>
</METS:file>
<METS:file ID="IMG00000002" MIMETYPE="image/jp2" SEQ="00000002" CREATED="2007-09-05T13:07:52" SIZE="37689" CHECKSUM="3c6300639e57f0e9305f6130565aab51" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="00000002.jp2"/>
</METS:file>
<METS:file ID="IMG00000003" MIMETYPE="image/tiff" SEQ="00000003" CREATED="2007-09-05T13:07:52" SIZE="1972" CHECKSUM="b097fa8086b852c07f169b63c3064cc8" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="00000003.tif"/>
</METS:file>
<METS:file ID="IMG00000004" MIMETYPE="image/tiff" SEQ="00000004" CREATED="2007-09-05T13:07:52" SIZE="1970" CHECKSUM="8d82918c9d351b07f92d84d0d2c98897" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="00000004.tif"/>
</METS:file>
</METS:fileGrp>
<METS:fileGrp ID="FG3" USE="ocr">
<METS:file ID="TXT00000001" MIMETYPE="text/plain" SEQ="00000001" CREATED="2007-09-05T13:07:51" SIZE="0" CHECKSUM="d41d8cd98f00b204e9800998ecf8427e" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="00000001.txt"/>
</METS:file>
<METS:file ID="TXT00000002" MIMETYPE="text/plain" SEQ="00000002" CREATED="2007-09-05T13:07:52" SIZE="52" CHECKSUM="226bdcccdb2089e12129a77c137a0eef" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="00000002.txt"/>
</METS:file>
<METS:file ID="TXT00000003" MIMETYPE="text/plain" SEQ="00000003" CREATED="2007-09-05T13:07:52" SIZE="0" CHECKSUM="d41d8cd98f00b204e9800998ecf8427e" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="00000003.txt"/>
</METS:file>
<METS:file ID="TXT00000004" MIMETYPE="text/plain" SEQ="00000004" CREATED="2007-09-05T13:07:52" SIZE="0" CHECKSUM="d41d8cd98f00b204e9800998ecf8427e" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="00000004.txt"/>
</METS:file>
</METS:fileGrp>
</METS:fileSec>
<METS:structMap ID="SM1" TYPE="physical">
<METS:div TYPE="volume">
<METS:div ORDER="1" TYPE="page" LABEL="FRONT_COVER, IMPLICIT_PAGE_NUMBER, MISSING_PAGE">
<METS:fptr FILEID="IMG00000001"/>
<METS:fptr FILEID="TXT00000001"/>
</METS:div>
<METS:div ORDER="2" TYPE="page" LABEL="IMAGE_ON_PAGE, UNTYPICAL_PAGE, IMPLICIT_PAGE_NUMBER">
<METS:fptr FILEID="IMG00000002"/>
<METS:fptr FILEID="TXT00000002"/>
</METS:div>
<METS:div ORDER="3" TYPE="page" LABEL="BLANK, IMPLICIT_PAGE_NUMBER">
<METS:fptr FILEID="IMG00000003"/>
<METS:fptr FILEID="TXT00000003"/>
</METS:div>
<METS:div ORDER="4" TYPE="page" LABEL="BLANK, IMPLICIT_PAGE_NUMBER">
<METS:fptr FILEID="IMG00000004"/>
<METS:fptr FILEID="TXT00000004"/>
</METS:div>
</METS:div>
</METS:structMap>
</METS:mets>
</entry>

 

RELAX NG Schema - Compact

default namespace = "http://www.w3.org/2005/Atom"
namespace METS = "http://www.loc.gov/METS/"
namespace PREMIS = "http://www.loc.gov/standards/premis"
namespace htd = "http://schemas.hathitrust.org/htd/2009"
namespace xlink = "http://www.w3.org/1999/xlink"
namespace xsi = "http://www.w3.org/2001/XMLSchema-instance"
start =
element entry {
element link {
attribute href { xsd:anyURI },
attribute rel { xsd:anyURI },
attribute type { text }
}+,
element htd:version { xsd:integer },
element id { xsd:anyURI },
element title { text },
element updated { xsd:NMTOKEN },
element htd:access {
attribute resource { xsd:NCName },
xsd:anyURI
}+,
element htd:rights {
element htd:note { empty },
element htd:user { xsd:NCName },
element htd:time { xsd:NMTOKEN },
element htd:namespace { xsd:NCName },
element htd:source { xsd:integer },
element htd:attr { xsd:integer },
element htd:id { xsd:integer },
element htd:reason { xsd:integer }
},
element METS:mets {
attribute OBJID { xsd:NCName },
attribute xsi:schemaLocation { text },
attribute xml:base { text },
element METS:metsHdr {
attribute CREATEDATE { xsd:NMTOKEN },
attribute ID { xsd:NCName },
attribute RECORDSTATUS { xsd:NCName },
element METS:agent {
attribute ROLE { xsd:NCName },
attribute TYPE { xsd:NCName },
element METS:name { xsd:NCName }
}
},
element METS:dmdSec {
attribute ID { xsd:NCName },
mdRef
},
element METS:amdSec {
element METS:techMD {
attribute ID { xsd:NCName },
(mdRef | mdWrap)
}+,
element METS:digiprovMD {
attribute ID { xsd:NCName },
mdWrap
}
},
element METS:fileSec {
element METS:fileGrp {
attribute ID { xsd:NCName },
attribute USE { text },
element METS:file {
attribute CHECKSUM { text },
attribute CHECKSUMTYPE { xsd:NCName },
attribute CREATED { xsd:NMTOKEN },
attribute ID { xsd:NCName },
attribute MIMETYPE { text },
attribute SEQ { xsd:integer },
attribute SIZE { xsd:integer },
element METS:FLocat {
attribute LOCTYPE { xsd:NCName },
attribute OTHERLOCTYPE { xsd:NCName },
attribute xlink:href { xsd:NMTOKEN }
}
}+
}+
},
element METS:structMap {
attribute ID { xsd:NCName },
attribute TYPE { xsd:NCName },
\div
}
}
}
mdRef =
element METS:mdRef {
attribute LABEL { text }?,
attribute LOCTYPE { xsd:NCName },
attribute MDTYPE { xsd:NCName },
attribute OTHERLOCTYPE { text },
attribute OTHERMDTYPE { xsd:NCName }?,
attribute XPTR { xsd:NCName }?,
attribute xlink:href { xsd:NCName }?
}
mdWrap =
element METS:mdWrap {
attribute MDTYPE { xsd:NCName },
element METS:xmlData {
element PREMIS:object {
element PREMIS:preservationLevel { xsd:integer }
}
| element PREMIS:event {
element PREMIS:eventIdentifier {
element PREMIS:eventIdentifierValue { text }
},
element PREMIS:eventType { text },
element PREMIS:eventDateTime { xsd:NMTOKEN },
(element PREMIS:eventDetail { xsd:NCName }
| element PREMIS:eventOutcomeInformation {
element PREMIS:eventOutcomeDetail { xsd:NCName }
})?,
element PREMIS:linkingAgentIdentifier {
element PREMIS:linkingAgentIdentifierType { xsd:NCName },
element PREMIS:linkingAgentIdentifierValue { text }
}
}+
}
}
\div =
element METS:div {
attribute LABEL { text }?,
attribute ORDER { xsd:integer }?,
attribute TYPE { xsd:NCName },
(\div,
element METS:fptr {
attribute FILEID { xsd:NCName }
}+)?
}

 

Aggregate

This resource is a zip file sent as application/zip. Currently this resource has only one structure:

  • for each page in the resource:
    • the page image
    • corresponding UTF-8 encoded OCR plain text

Example URI

Resource request for the zip file for a in-copyright, Google-digitized book in response format of application/zip. Note (https://) protocol.

https://services.hathitrust.org/api/htd/aggregate/mdp.39015002110867

 

Single Page Image

Example URI

Resource request for the 12th sequential page image from an public domain, DLPS-digitized public-domain book. Depending on how the page was scanned the response format is one of the following.

  • image/tiff
  • image/jp2
  • image/jpg

http://services.hathitrust.org/api/htd/pageimage/miun.abr0732.0001.001/12

 

Single Page OCR

This resource is the UTF-8 encoded OCR plain text of a given page image.

Example URI

Resource request for the OCR text of the 30th sequential page image from an in-copyright, Google-digitized book. The response format is text/plain. Note (https://) protocol.

https://services.hathitrust.org/api/htd/pageocr/mdp.39015005102796/30

 

Security

Data that is not restricted by contractual agreement with Google and all metadata may be retrieved over HTTP without authentication or authorization.

The metadata resources are:

The API limits retrieval of restricted data to authenticated client applications that are authorized by HathiTrust to access that data. In addition, the data is encrypted over SSL to protect it in transit.

To access restricted data, the client application must obtain an password-protected SSL certificate from a certificate authority such as VeriSign among others and present the certificate and password over SSL to authenticate with the API. The certificate states that it authenticates a Common Name (CN), typically an email address. The CN must be registered with HathiTrust.org. Only applications presenting certificates for registered Common Names are authorized to retrieve restricted data. Authorization is negotiated out-of-band by contacting HathiTrust.org.

A client application that is authorized to display restricted data is required to have an authentication and authorization scheme in place to limit access trusted users. Restriction is based primarily on the the rights attribute.

Some repository items are open to download without restriction. There is a requirement to prevent systematic downloading of certain other items in the HathiTrust repository. For these items, a quota may be imposed to limit the download rate. Finally, some items may only be downloaded by authorized clients.

The value of the htd:access[@resource="aggregate|pageocr|pageimage"] element, for a given resource, indicates the restrictions on that resource.

htd:access value Explanation
http://schemas.hathitrust.org/htd/2009#restricted authentication and authorization required
http://schemas.hathitrust.org/htd/2009#limited access to public domain items is open but may be rate-limited
http://schemas.hathitrust.org/htd/2009#open no restrictions

Response Codes

This section lists the HTTP response codes and their meanings in the context of the API.

Code Explanation
200 OK No error. The request to retrieve the resource was successful.
400 BAD REQUEST Invalid request URI or HTTP header, or unsupported nonstandard parameter.
401 UNAUTHORIZED Authorization required. This will occur when the Common Name in the SSL Certificate has not been registered with HathiTrust.org.
403 FORBIDDEN SSL Certificate authentication failed.
404 NOT FOUND Resource identified by :ID or :ID/:SEQ not found.
500 INTERNAL SERVER ERROR Internal error. This is the default code that is used for all unrecognized errors.
503 SERVICE UNAVAILABLE Quota exceeded.