HathiTrust Data API
Contents |
Introduction (DRAFT - Rev. 0.7)
This document describes a RESTful API to provide access to HathiTrust repository data and metadata resources. The HathiTrust Repository Data (HTD) API is referred to simply as API in this document.
NOTE: This document is in draft status and is presented for comment.
Quick Overview
The HTD API provides extensible, efficient and secure access to the data and metadata resources of the HathiTrust Repository. The design intent is to support client applications that already have an item identifier and simply need the corresponding data (or metadata). It should make services and uses possible beyond those available through current applications. Examples of current applications are the HathiTrust Collection Builder and Pageturner.
Applications that need a large number of metadata records or data sets should request them iteratively.
The repository resources consist of digitized print or born-digital volumes composed of page images and OCR text and corresponding structural and administrative metadata. Other APIs provide sources of identifiers and bibliographic metadata. Examples include:
The API accepts a request for a resource and returns XML, JSON or binary representations of the resource. The available representations depend on the resource in question.
The resources served by the API are partitioned into classes that have varying access policies.
- meta, pagemeta and structure - Accessible without restriction. No authentication or authorization required.
- aggregate, pageimage and pageocr - Access varies:
- Open to download without restriction, OR
- Available to download with some restrictions on rate of access but otherwise open, OR
- Download requires client authentication to and authorization by the API.
Search and browse are seen as separate services available through a different API than described here.
URI scheme
Concrete examples are provided in each section describing a resource below. Square braces indicate an optional parameter. Throughout, variables are UPPERCASE prefixed with a colon, e.g. :VAR. XPath notation for elements and attributes appears occasionally.
http[s]://services.hathitrust.org/api/htd/:RESOURCE/:ID[/:SEQ]
?
[v=:N]
[&alt=json[&callback=:CALLBACK]]
Access to restricted resources is over SSL using https:// protocol.
The values for the :RESOURCE variable for version 1 are:
metastructureaggregatepageimagepageocrpagemeta
The :ID variable ranges over the all namespace-qualified barcodes or other logical identifiers for repository objects. Examples of namespaces are mdp, miun, wu.
The :SEQ variable is an integer starting at 1 and ranges up to the number of page images in the object.
The version query parameter (v) supports backward compatibility as the API evolves. Clients can explicitly specify the version of the requests they are issuing and of the response that they want. By default, when no version parameter is supplied, responses are in the format of the latest version of the API.
Where application/xml responses apply, alt=json requests the response in JSON format. In that case, an optional javascript callback function name can be supplied. The default response format is application/xml .
HTD Extension Elements, Attributes and Schema
The schema for the XML responses is based on the Atom Syndication Format in the spirit of the response schema for a volume from the Google Book Search Data API as shown in the Data API: Reference Guide.
XML responses are formatted as atom:entry elements in the default atom namespace. The required atom:id, atom:title, atom:updated elements are present. The HTD API schema extends the Atom schema by defining and using the htd namespace.
Note that the use of the atom:entry element is adopted in the context of access to data and not necessarily of access to a feed.
The HTD API is a data API with accompanying structural and administrative metadata. It is not a bibliographic metadata API. The atom:title element contains text that describes the entry and is not the title of the book. For example,
HathiTrust Repository Data API - single page metadata.
The schema employs a URI-based scheme for additional values of the atom:link[@rel] attribute. For resource identifiers we have:
http://schemas.hathitrust.org/htd/2009#metahttp://schemas.hathitrust.org/htd/2009#pagemetahttp://schemas.hathitrust.org/htd/2009#structurehttp://schemas.hathitrust.org/htd/2009#aggregatehttp://schemas.hathitrust.org/htd/2009#pageimagehttp://schemas.hathitrust.org/htd/2009#pageocr
The optional element atom:link appears with the rel=alternate and rel=self attributes.
link[@rel='alternate]- Generally taken to mean the permalink to the content pointed to by the entry. Currently this includes a link to the HathiTrust pageturner which is quasi-permanent and a link to the Handle Server. For example,
http://babel.hathitrust.org/cgi/pt?id=:ID[&seq=:SEQ]
and
http://hdl.handle.net/2027/:ID
link[@rel='self']- This is the preferred URI for retrieving the entry itself. This value is important in scenarios where only the entry is available and not the location from which the entry was retrieved.
Extension Elements
Extension elements are in the htd namespace and vary with response. Refer to example abstract responses below.
htd:version- the version number of the API generating the responsehtd:selected_seq- the page sequence number requested. (pagemetaresource only.)htd:numpages- the number of pages in the volumehtd:access[@resource]- asserts whether downloading the page images, OCR and zipped data is restricted to authenticated and authorized client applications or open but rate-limited or freely available. Metadata access does not require authentication and authorization. Restricted or limited access does not imply restricted viewability. Restricted data may be freely viewable or viewable only under certain conditions, e.g. based on geographical location of the client or the brittle status of the book or the authenticated identity of the user. This means the client application must make the final viewability determination and is bound by the agreements that permit downloading restricted items. Attribute values:http://schemas.hathitrust.org/htd/2009#openhttp://schemas.hathitrust.org/htd/2009#limitedhttp://schemas.hathitrust.org/htd/2009#restricted
htd:rights- container element for rights metadata:htd:namespace- the namespace of the :ID (dotted concatenation ofhtd:namespaceandhtd:idhtd:id- the volume barcodehtd:attr- See HathiTrust rights database documenthtd:reason- See HathiTrust rights database documenthtd:source- See HathiTrust rights database documenthtd:user- See HathiTrust rights database documenthtd:time- See HathiTrust rights database documenthtd:note- See HathiTrust rights database document
htd:pgmap- container element for page number to page sequence number maphtd:pg[@pgnum]- the mapping element. attribute is page number, content is page sequence number. one for each page number.
htd:seqmap- container element for map of page sequence number to page number, feature, format.htd:seq[@pseq]- attribute is the sequence number of the page, content is the page numberhtd:pnum- the page number either printed or implicit (if available)htd:imgfmt- format of the page image:tifforjp2orjpghtd:pfeat- the page feature key (if available):CHAPTER_STARTCOPYRIGHTFIRST_CONTENT_CHAPTER_STARTFRONT_COVERINDEXREFERENCESTABLE_OF_CONTENTSTITLE
Schema
Refer to Example Abstract Responses in each Resource section for more information.
Resources and Representations
The API provides access to the following resources. The MIME types of the available representations are shown in the table below. An example URI is provided. An example abstract response is shown for resources with application/xml representations.
| Resource | Representation(s)/MIME type(s) |
|---|---|
| Volume and Rights Metadata (meta) | application/atom+xml & application/json |
| METS (structure) | application/xml & application/json |
| zip file (aggregate) | application/zip |
| Single Page Metadata (pagemeta) | application/atom+xml & application/json |
| Single Page Image (pageimage) | image/jp2 | image/tiff | image/jpg |
| Single Page OCR (pageocr) | text/plain |
Volume and Rights Metadata
This resource consists of:
- API version number
- access values
- count of page image / OCR text pairs
- a row of the rights database consisting of the data from following fields:
id, namespace, attr, reason, source, user, time, noteas described in the database layout document - a map of page sequence number to:
- page number, either explicitly on the printed page or algorithmically derived during digitization
- page feature tags as defined by the label attribute of the
METS:structMap/METS:divelement of the Structure (METS document) resource. See Extension Elements and METS schema. - page image file format, one of
tifforjp2orjpg
- a map of page number to page sequence number
Note: Page feature and page number metadata is not available for some instances of this resource.
Compare with Single Page Metadata
Example URI
Resource request for the volume and rights metadata for a public domain, Google-digitized book in response format of application/atom+xml
http://services.hathitrust.org/api/htd/meta/mdp.39015070515765
Example Response
Edited for brevity.
<entry
xmlns="http://www.w3.org/2005/Atom"
xmlns:htd="http://schemas.hathitrust.org/htd/2009">
<link
rel="alternate"
href="http://hdl.handle.net/2027/mdp.39015070515765"
type="text/html"/>
<link
rel="self"
href="http://services.hathitrust.org/api/htd/meta/mdp.39015070515765"
type="application/atom+xml"/>
<link
rel="http://schemas.hathitrust.org/htd/2009#aggregate"
href="https://services.hathitrust.org/api/htd/aggregate/mdp.39015070515765"
type="application/zip"/>
<link
rel="http://schemas.hathitrust.org/htd/2009#structure"
href="http://services.hathitrust.org/api/htd/structure/mdp.39015070515765"
type="application/xml"/>
<htd:version>1</htd:version>
<htd:numpages>668</htd:numpages>
<htd:seqmap>
<htd:seq pseq="1">
<htd:pnum></htd:pnum>
<htd:pfeat>FRONT_COVER</htd:pfeat>
<htd:pfeat>IMAGE_ON_PAGE</htd:pfeat>
<htd:pfeat>UNTYPICAL_PAGE</htd:pfeat>
<htd:pfeat>IMPLICIT_PAGE_NUMBER</htd:pfeat>
<htd:imgfmt>image/jp2</htd:imgfmt>
</htd:seq>
<htd:seq pseq="2">
<htd:pnum>i</htd:pnum>
<htd:pfeat>UNTYPICAL_PAGE</htd:pfeat>
<htd:pfeat>IMPLICIT_PAGE_NUMBER</htd:pfeat>
<htd:imgfmt>image/jp2</htd:imgfmt>
</htd:seq>
<htd:seq pseq="3">
<htd:pnum>ii</htd:pnum>
<htd:pfeat>IMPLICIT_PAGE_NUMBER</htd:pfeat>
<htd:imgfmt>image/jp2</htd:imgfmt>
</htd:seq>
</htd:seqmap>
<htd:pgmap>
<htd:pg pgnum="i">2</htd:pg>
<htd:pg pgnum="ii">3</htd:pg>
</htd:pgmap>
<id>http://services.hathitrust.org/api/htd/meta/mdp.39015070515765</id>
<title>HathiTrust Repository Data API - metadata</title>
<updated>2009-03-11T17:29:58.602-0400</updated>
<htd:access resource="pageocr">
http://schemas.hathitrust.org/htd/2009#limited</htd:access>
<htd:access resource="pageimage">
http://schemas.hathitrust.org/htd/2009#limited</htd:access>
<htd:access resource="aggregate">
http://schemas.hathitrust.org/htd/2009#restricted</htd:access>
<htd:rights>
<htd:note/>
<htd:user>jhovater</htd:user>
<htd:time>2008-07-09T00:30:11</htd:time>
<htd:namespace>mdp</htd:namespace>
<htd:source>1</htd:source>
<htd:attr>1</htd:attr>
<htd:id>39015070515765</htd:id>
<htd:reason>1</htd:reason>
</htd:rights>
</entry>
RELAX NG Schema - Compact
default namespace = "http://www.w3.org/2005/Atom"
namespace htd = "http://schemas.hathitrust.org/htd/2009"
start =
element entry {
element link {
attribute href { xsd:anyURI },
attribute rel { xsd:anyURI },
attribute type { text }
}+,
element htd:version { xsd:integer },
element htd:numpages { xsd:integer },
element htd:seqmap {
element htd:seq {
attribute pseq { xsd:integer },
element htd:pnum { text },
element htd:pfeat { xsd:NCName }+,
element htd:imgfmt { text }
}+
},
element htd:pgmap {
element htd:pg {
attribute pgnum { xsd:NCName },
xsd:integer
}+
},
element id { xsd:anyURI },
element title { text },
element updated { xsd:NMTOKEN },
element htd:access {
attribute resource { xsd:NCName },
xsd:anyURI
}+,
element htd:rights {
element htd:note { empty },
element htd:user { xsd:NCName },
element htd:time { xsd:NMTOKEN },
element htd:namespace { xsd:NCName },
element htd:source { xsd:integer },
element htd:attr { xsd:integer },
element htd:id { xsd:integer },
element htd:reason { xsd:integer }
}
}
Single Page Metadata
This resource consists of a partial reiteration of the volume and rights metadata for the book together with the page feature metadata available for the given sequential page.
Example URL
Resource request for the metadata for the 11th sequential page of an in-copyright, Google-digitized book in response format of application/json. Compare with Volume and Rights Metadata.
https://services.hathitrust.org/api/htd/pagemeta/mdp.39015005102796/11?alt=json
Example Response
{
"xmlns": "http://www.w3.org/2005/Atom",
"xmlns:htd": "http://schemas.hathitrust.org/htd/2009",
"id": "http://services.hathitrust.org/api/htd/pagemeta/mdp.39015005102796/11",
"title":"HathiTrust Repository Data API - single page metadata",
"updated": "2009-03-12T09:48:43.885-0400",
"link": [
{
"rel": "alternate",
"href": "http://hdl.handle.net/2027/mdp.39015005102796",
"type": "text/html"
},
{
"rel": "self",
"href": "http://services.hathitrust.org/api/htd/pagemeta/mdp.39015005102796/11",
"type": "application/atom+xml"
},
{
"rel": "http://schemas.hathitrust.org/htd/2009#pageimage",
"href": "https://services.hathitrust.org/api/htd/pageimage/mdp.39015005102796/11",
"type": "image/tiff"
},
{
"rel": "http://schemas.hathitrust.org/htd/2009#pageocr",
"href": "https://services.hathitrust.org/api/htd/pageocr/mdp.39015005102796/11",
"type": "text/plain"
},
{
"rel": "http://schemas.hathitrust.org/htd/2009#aggregate",
"href": "https://services.hathitrust.org/api/htd/aggregate/mdp.39015005102796",
"type": "application/zip"
},
{
"rel": "http://schemas.hathitrust.org/htd/2009#structure",
"href": "http://services.hathitrust.org/api/htd/structure/mdp.39015005102796",
"type": "application/xml"
},
{
"rel": "http://schemas.hathitrust.org/htd/2009#meta",
"href": "http://services.hathitrust.org/api/htd/meta/mdp.39015005102796",
"type": "application/atom+xml"
}
],
"htd:version": "1",
"htd:selected_seq": "11",
"htd:numpages": "262",
"htd:seqmap": [
{
"htd:seq": {
"htd:imgfmt": "image/tiff",
"htd:pnum": "7",
"htd:pfeat": [
"FIRST_CONTENT_CHAPTER_START",
"IMPLICIT_PAGE_NUMBER"
],
"pseq": "11"
}
}
],
"htd:access": [
{
"content": "http://schemas.hathitrust.org/htd/2009#restricted",
"resource": "pageimage"
},
{
"content": "http://schemas.hathitrust.org/htd/2009#restricted",
"resource": "pageocr"
},
{
"content": "http://schemas.hathitrust.org/htd/2009#restricted",
"resource": "aggregate"
}
],
"htd:rights": {
"htd:user": "jhovater",
"htd:note": {},
"htd:time": "2008-08-14T22:30:23",
"htd:namespace": "mdp",
"htd:source": "1",
"htd:attr": "2",
"htd:id": "39015005102796",
"htd:reason": "1"
},
"htd:pgmap": [
{
"htd:pg": {
"content": "11",
"pgnum": "7"
}
}
],
}
Relax NG Schema - Compact
default namespace = "http://www.w3.org/2005/Atom"
namespace htd = "http://schemas.hathitrust.org/htd/2009"
start =
element entry {
element link {
attribute href { xsd:anyURI },
attribute rel { xsd:anyURI },
attribute type { text }
}+,
element htd:version { xsd:integer },
element htd:selected_seq { xsd:integer },
element htd:numpages { xsd:integer },
element htd:seqmap {
element htd:seq {
attribute pseq { xsd:integer },
element htd:pnum { xsd:NCName },
element htd:pfeat { xsd:NCName }+,
element htd:imgfmt { text }
}
},
element htd:pgmap {
element htd:pg {
attribute pgnum { xsd:NCName },
xsd:integer
}
},
element id { xsd:anyURI },
element title { text },
element updated { xsd:NMTOKEN },
element htd:access {
attribute resource { xsd:NCName },
xsd:anyURI
}+,
element htd:rights {
element htd:note { empty },
element htd:user { xsd:NCName },
element htd:time { xsd:NMTOKEN },
element htd:namespace { xsd:NCName },
element htd:source { xsd:integer },
element htd:attr { xsd:integer },
element htd:id { xsd:integer },
element htd:reason { xsd:integer }
}
}
Structure
This resource is a currently a METS document representing the volume. The application/xml representation for the METS portion of this resource is described by the METS schema. This resource gives the client application the most detailed picture of the aggregate repository object.
Example URI
Resource request for the METS document for a public domain (US), Google-digitized book in response format of application/xml where the version of the API is explicitly requested:
http://services.hathitrust.org/api/htd/structure/mdp.39015064570875?v=1
Example Response
Edited for brevity.
<entry
xmlns="http://www.w3.org/2005/Atom"
xmlns:htd="http://schemas.hathitrust.org/htd/2009">
<link
rel="alternate"
href="http://hdl.handle.net/2027/mdp.39015064570875"
type="text/html"/>
<link
rel="self"
href="http://services.hathitrust.org/api/htd/structure/mdp.39015064570875"
type="application/xml"/>
<link
rel="http://schemas.hathitrust.org/htd/2009#aggregate"
href="https://services.hathitrust.org/api/htd/aggregate/mdp.39015064570875"
type="application/zip"/>
<link
rel="http://schemas.hathitrust.org/htd/2009#meta"
href="http://services.hathitrust.org/api/htd/meta/mdp.39015064570875"
type="application/xml"/>
<htd:version>1</htd:version>
<id>http://services.hathitrust.org/api/htd/structure/mdp.39015064570875</id>
<title>HathiTrust Repository Data API - METS</title>
<updated>2009-03-12T13:33:59.933-0400</updated>
<htd:access resource="pageocr">http://schemas.hathitrust.org/htd/2009#limited
</htd:access>
<htd:access resource="pageimage">http://schemas.hathitrust.org/htd/2009#limited
</htd:access>
<htd:access resource="aggregate">http://schemas.hathitrust.org/htd/2009#restricted
</htd:access>
<htd:rights>
<htd:note/>
<htd:user>jhovater</htd:user>
<htd:time>2007-09-10T09:30:04</htd:time>
<htd:namespace>mdp</htd:namespace>
<htd:source>1</htd:source>
<htd:attr>9</htd:attr>
<htd:id>39015064570875</htd:id>
<htd:reason>1</htd:reason>
</htd:rights>
<METS:mets
xmlns:METS="http://www.loc.gov/METS/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:PREMIS="http://www.loc.gov/standards/premis"
xsi:schemaLocation="http://www.loc.gov/METS/
http://www.loc.gov/standards/mets/mets.xsd
http://purl.org/dc/elements/1.1/"
OBJID="mdp.39015064570875"
xml:base="/sdr1/obj/mdp/pairtree_root/39/01/50/64/57/08/75/39015064570875/39015064570875.mets.xml">
<METS:metsHdr ID="mdp.39015064570875" CREATEDATE="2008-06-05T16:06:23" RECORDSTATUS="NEW">
<METS:agent ROLE="CREATOR" TYPE="ORGANIZATION">
<METS:name>DLPS</METS:name>
</METS:agent>
</METS:metsHdr>
<METS:dmdSec ID="DMD1">
<METS:mdRef
MDTYPE="MARC"
LOCTYPE="OTHER"
OTHERLOCTYPE="Item ID stored as second call number in item record"
XPTR="mdp.39015064570875"/>
</METS:dmdSec>
<METS:amdSec>
<METS:techMD ID="TMD1">
<METS:mdRef
LOCTYPE="OTHER"
OTHERLOCTYPE="SYSTEM"
MDTYPE="OTHER"
OTHERMDTYPE="text"
LABEL="production notes"
xlink:href="notes.txt"/>
</METS:techMD>
<METS:techMD ID="TMD2">
<METS:mdRef
LOCTYPE="OTHER"
OTHERLOCTYPE="SYSTEM"
MDTYPE="OTHER"
OTHERMDTYPE="text"
LABEL="page metadata"
xlink:href="pagedata.txt"/>
</METS:techMD>
<METS:techMD ID="premisobject1">
<METS:mdWrap MDTYPE="PREMIS">
<METS:xmlData>
<PREMIS:object>
<PREMIS:preservationLevel>1</PREMIS:preservationLevel>
</PREMIS:object>
</METS:xmlData>
</METS:mdWrap>
</METS:techMD>
<METS:digiprovMD ID="premisevent1">
<METS:mdWrap MDTYPE="PREMIS">
<METS:xmlData>
<PREMIS:event>
<PREMIS:eventIdentifier>
<PREMIS:eventIdentifierValue>capture1</PREMIS:eventIdentifierValue>
</PREMIS:eventIdentifier>
<PREMIS:eventType>capture</PREMIS:eventType>
<PREMIS:eventDateTime>2007-06-18T00:00:00</PREMIS:eventDateTime>
<PREMIS:linkingAgentIdentifier>
<PREMIS:linkingAgentIdentifierType>AgentID</PREMIS:linkingAgentIdentifierType>
<PREMIS:linkingAgentIdentifierValue>Google, Inc.</PREMIS:linkingAgentIdentifierValue>
</PREMIS:linkingAgentIdentifier>
</PREMIS:event>
<PREMIS:event>
<PREMIS:eventIdentifier>
<PREMIS:eventIdentifierValue>compression1</PREMIS:eventIdentifierValue>
</PREMIS:eventIdentifier>
<PREMIS:eventType>compression</PREMIS:eventType>
<PREMIS:eventDateTime>2007-09-05T10:09:00</PREMIS:eventDateTime>
<PREMIS:linkingAgentIdentifier>
<PREMIS:linkingAgentIdentifierType>AgentID</PREMIS:linkingAgentIdentifierType>
<PREMIS:linkingAgentIdentifierValue>Google, Inc.</PREMIS:linkingAgentIdentifierValue>
</PREMIS:linkingAgentIdentifier>
</PREMIS:event>
<PREMIS:event>
<PREMIS:eventIdentifier>
<PREMIS:eventIdentifierValue>decryption1</PREMIS:eventIdentifierValue>
</PREMIS:eventIdentifier>
<PREMIS:eventType>decryption</PREMIS:eventType>
<PREMIS:eventDateTime>2007-09-08T02:57:45</PREMIS:eventDateTime>
<PREMIS:linkingAgentIdentifier>
<PREMIS:linkingAgentIdentifierType>AgentID</PREMIS:linkingAgentIdentifierType>
<PREMIS:linkingAgentIdentifierValue>UM</PREMIS:linkingAgentIdentifierValue>
</PREMIS:linkingAgentIdentifier>
</PREMIS:event>
<PREMIS:event>
<PREMIS:eventIdentifier>
<PREMIS:eventIdentifierValue>fixity check1</PREMIS:eventIdentifierValue>
</PREMIS:eventIdentifier>
<PREMIS:eventType>fixity check</PREMIS:eventType>
<PREMIS:eventDateTime>2007-09-08T02:57:45</PREMIS:eventDateTime>
<PREMIS:eventOutcomeInformation>
<PREMIS:eventOutcomeDetail>pass</PREMIS:eventOutcomeDetail>
</PREMIS:eventOutcomeInformation>
<PREMIS:linkingAgentIdentifier>
<PREMIS:linkingAgentIdentifierType>AgentID</PREMIS:linkingAgentIdentifierType>
<PREMIS:linkingAgentIdentifierValue>UM</PREMIS:linkingAgentIdentifierValue>
</PREMIS:linkingAgentIdentifier>
</PREMIS:event>
<PREMIS:event>
<PREMIS:eventIdentifier>
<PREMIS:eventIdentifierValue>ingestion1</PREMIS:eventIdentifierValue>
</PREMIS:eventIdentifier>
<PREMIS:eventType>ingestion</PREMIS:eventType>
<PREMIS:eventDateTime>2007-09-08T02:57:45</PREMIS:eventDateTime>
<PREMIS:linkingAgentIdentifier>
<PREMIS:linkingAgentIdentifierType>AgentID</PREMIS:linkingAgentIdentifierType>
<PREMIS:linkingAgentIdentifierValue>UM</PREMIS:linkingAgentIdentifierValue>
</PREMIS:linkingAgentIdentifier>
</PREMIS:event>
<PREMIS:event>
<PREMIS:eventIdentifier>
<PREMIS:eventIdentifierValue>message digest calculation1</PREMIS:eventIdentifierValue>
</PREMIS:eventIdentifier>
<PREMIS:eventType>message digest calculation</PREMIS:eventType>
<PREMIS:eventDateTime>2007-09-05T10:09:00</PREMIS:eventDateTime>
<PREMIS:eventDetail>jhove1_1e</PREMIS:eventDetail>
<PREMIS:linkingAgentIdentifier>
<PREMIS:linkingAgentIdentifierType>AgentID</PREMIS:linkingAgentIdentifierType>
<PREMIS:linkingAgentIdentifierValue>Google, Inc.</PREMIS:linkingAgentIdentifierValue>
</PREMIS:linkingAgentIdentifier>
</PREMIS:event>
<PREMIS:event>
<PREMIS:eventIdentifier>
<PREMIS:eventIdentifierValue>validation1</PREMIS:eventIdentifierValue>
</PREMIS:eventIdentifier>
<PREMIS:eventType>validation</PREMIS:eventType>
<PREMIS:eventDateTime>2007-09-08T02:57:45</PREMIS:eventDateTime>
<PREMIS:linkingAgentIdentifier>
<PREMIS:linkingAgentIdentifierType>AgentID</PREMIS:linkingAgentIdentifierType>
<PREMIS:linkingAgentIdentifierValue>UM</PREMIS:linkingAgentIdentifierValue>
</PREMIS:linkingAgentIdentifier>
</PREMIS:event>
</METS:xmlData>
</METS:mdWrap>
</METS:digiprovMD>
</METS:amdSec>
<METS:fileSec>
<METS:fileGrp ID="FG1" USE="zip archive">
<METS:file ID="ZIP00000001" MIMETYPE="application/zip" SEQ="00000001" CREATED="2008-06-05T16:06:23" SIZE="7065774" CHECKSUM="b26aff19a616a83eefd0ffcb43be10b0" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="39015064570875.zip"/>
</METS:file>
</METS:fileGrp>
<METS:fileGrp ID="FG2" USE="image">
<METS:file ID="IMG00000001" MIMETYPE="image/tiff" SEQ="00000001" CREATED="2007-09-05T13:07:51" SIZE="1498" CHECKSUM="ad1491cf5c381b7752e5b53f3b50621c" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="00000001.tif"/>
</METS:file>
<METS:file ID="IMG00000002" MIMETYPE="image/jp2" SEQ="00000002" CREATED="2007-09-05T13:07:52" SIZE="37689" CHECKSUM="3c6300639e57f0e9305f6130565aab51" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="00000002.jp2"/>
</METS:file>
<METS:file ID="IMG00000003" MIMETYPE="image/tiff" SEQ="00000003" CREATED="2007-09-05T13:07:52" SIZE="1972" CHECKSUM="b097fa8086b852c07f169b63c3064cc8" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="00000003.tif"/>
</METS:file>
<METS:file ID="IMG00000004" MIMETYPE="image/tiff" SEQ="00000004" CREATED="2007-09-05T13:07:52" SIZE="1970" CHECKSUM="8d82918c9d351b07f92d84d0d2c98897" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="00000004.tif"/>
</METS:file>
</METS:fileGrp>
<METS:fileGrp ID="FG3" USE="ocr">
<METS:file ID="TXT00000001" MIMETYPE="text/plain" SEQ="00000001" CREATED="2007-09-05T13:07:51" SIZE="0" CHECKSUM="d41d8cd98f00b204e9800998ecf8427e" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="00000001.txt"/>
</METS:file>
<METS:file ID="TXT00000002" MIMETYPE="text/plain" SEQ="00000002" CREATED="2007-09-05T13:07:52" SIZE="52" CHECKSUM="226bdcccdb2089e12129a77c137a0eef" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="00000002.txt"/>
</METS:file>
<METS:file ID="TXT00000003" MIMETYPE="text/plain" SEQ="00000003" CREATED="2007-09-05T13:07:52" SIZE="0" CHECKSUM="d41d8cd98f00b204e9800998ecf8427e" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="00000003.txt"/>
</METS:file>
<METS:file ID="TXT00000004" MIMETYPE="text/plain" SEQ="00000004" CREATED="2007-09-05T13:07:52" SIZE="0" CHECKSUM="d41d8cd98f00b204e9800998ecf8427e" CHECKSUMTYPE="MD5">
<METS:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" xlink:href="00000004.txt"/>
</METS:file>
</METS:fileGrp>
</METS:fileSec>
<METS:structMap ID="SM1" TYPE="physical">
<METS:div TYPE="volume">
<METS:div ORDER="1" TYPE="page" LABEL="FRONT_COVER, IMPLICIT_PAGE_NUMBER, MISSING_PAGE">
<METS:fptr FILEID="IMG00000001"/>
<METS:fptr FILEID="TXT00000001"/>
</METS:div>
<METS:div ORDER="2" TYPE="page" LABEL="IMAGE_ON_PAGE, UNTYPICAL_PAGE, IMPLICIT_PAGE_NUMBER">
<METS:fptr FILEID="IMG00000002"/>
<METS:fptr FILEID="TXT00000002"/>
</METS:div>
<METS:div ORDER="3" TYPE="page" LABEL="BLANK, IMPLICIT_PAGE_NUMBER">
<METS:fptr FILEID="IMG00000003"/>
<METS:fptr FILEID="TXT00000003"/>
</METS:div>
<METS:div ORDER="4" TYPE="page" LABEL="BLANK, IMPLICIT_PAGE_NUMBER">
<METS:fptr FILEID="IMG00000004"/>
<METS:fptr FILEID="TXT00000004"/>
</METS:div>
</METS:div>
</METS:structMap>
</METS:mets>
</entry>
RELAX NG Schema - Compact
default namespace = "http://www.w3.org/2005/Atom"
namespace METS = "http://www.loc.gov/METS/"
namespace PREMIS = "http://www.loc.gov/standards/premis"
namespace htd = "http://schemas.hathitrust.org/htd/2009"
namespace xlink = "http://www.w3.org/1999/xlink"
namespace xsi = "http://www.w3.org/2001/XMLSchema-instance"
start =
element entry {
element link {
attribute href { xsd:anyURI },
attribute rel { xsd:anyURI },
attribute type { text }
}+,
element htd:version { xsd:integer },
element id { xsd:anyURI },
element title { text },
element updated { xsd:NMTOKEN },
element htd:access {
attribute resource { xsd:NCName },
xsd:anyURI
}+,
element htd:rights {
element htd:note { empty },
element htd:user { xsd:NCName },
element htd:time { xsd:NMTOKEN },
element htd:namespace { xsd:NCName },
element htd:source { xsd:integer },
element htd:attr { xsd:integer },
element htd:id { xsd:integer },
element htd:reason { xsd:integer }
},
element METS:mets {
attribute OBJID { xsd:NCName },
attribute xsi:schemaLocation { text },
attribute xml:base { text },
element METS:metsHdr {
attribute CREATEDATE { xsd:NMTOKEN },
attribute ID { xsd:NCName },
attribute RECORDSTATUS { xsd:NCName },
element METS:agent {
attribute ROLE { xsd:NCName },
attribute TYPE { xsd:NCName },
element METS:name { xsd:NCName }
}
},
element METS:dmdSec {
attribute ID { xsd:NCName },
mdRef
},
element METS:amdSec {
element METS:techMD {
attribute ID { xsd:NCName },
(mdRef | mdWrap)
}+,
element METS:digiprovMD {
attribute ID { xsd:NCName },
mdWrap
}
},
element METS:fileSec {
element METS:fileGrp {
attribute ID { xsd:NCName },
attribute USE { text },
element METS:file {
attribute CHECKSUM { text },
attribute CHECKSUMTYPE { xsd:NCName },
attribute CREATED { xsd:NMTOKEN },
attribute ID { xsd:NCName },
attribute MIMETYPE { text },
attribute SEQ { xsd:integer },
attribute SIZE { xsd:integer },
element METS:FLocat {
attribute LOCTYPE { xsd:NCName },
attribute OTHERLOCTYPE { xsd:NCName },
attribute xlink:href { xsd:NMTOKEN }
}
}+
}+
},
element METS:structMap {
attribute ID { xsd:NCName },
attribute TYPE { xsd:NCName },
\div
}
}
}
mdRef =
element METS:mdRef {
attribute LABEL { text }?,
attribute LOCTYPE { xsd:NCName },
attribute MDTYPE { xsd:NCName },
attribute OTHERLOCTYPE { text },
attribute OTHERMDTYPE { xsd:NCName }?,
attribute XPTR { xsd:NCName }?,
attribute xlink:href { xsd:NCName }?
}
mdWrap =
element METS:mdWrap {
attribute MDTYPE { xsd:NCName },
element METS:xmlData {
element PREMIS:object {
element PREMIS:preservationLevel { xsd:integer }
}
| element PREMIS:event {
element PREMIS:eventIdentifier {
element PREMIS:eventIdentifierValue { text }
},
element PREMIS:eventType { text },
element PREMIS:eventDateTime { xsd:NMTOKEN },
(element PREMIS:eventDetail { xsd:NCName }
| element PREMIS:eventOutcomeInformation {
element PREMIS:eventOutcomeDetail { xsd:NCName }
})?,
element PREMIS:linkingAgentIdentifier {
element PREMIS:linkingAgentIdentifierType { xsd:NCName },
element PREMIS:linkingAgentIdentifierValue { text }
}
}+
}
}
\div =
element METS:div {
attribute LABEL { text }?,
attribute ORDER { xsd:integer }?,
attribute TYPE { xsd:NCName },
(\div,
element METS:fptr {
attribute FILEID { xsd:NCName }
}+)?
}
Aggregate
This resource is a zip file sent as application/zip. Currently this resource has only one structure:
- for each page in the resource:
- the page image
- corresponding UTF-8 encoded OCR plain text
Example URI
Resource request for the zip file for a in-copyright, Google-digitized book in response format of application/zip. Note (https://) protocol.
https://services.hathitrust.org/api/htd/aggregate/mdp.39015002110867
Single Page Image
Example URI
Resource request for the 12th sequential page image from an public domain, DLPS-digitized public-domain book. Depending on how the page was scanned the response format is one of the following.
image/tiffimage/jp2image/jpg
http://services.hathitrust.org/api/htd/pageimage/miun.abr0732.0001.001/12
Single Page OCR
This resource is the UTF-8 encoded OCR plain text of a given page image.
Example URI
Resource request for the OCR text of the 30th sequential page image from an in-copyright, Google-digitized book. The response format is text/plain. Note (https://) protocol.
https://services.hathitrust.org/api/htd/pageocr/mdp.39015005102796/30
Security
Data that is not restricted by contractual agreement with Google and all metadata may be retrieved over HTTP without authentication or authorization.
The metadata resources are:
The API limits retrieval of restricted data to authenticated client applications that are authorized by HathiTrust to access that data. In addition, the data is encrypted over SSL to protect it in transit.
To access restricted data, the client application must obtain an password-protected SSL certificate from a certificate authority such as VeriSign among others and present the certificate and password over SSL to authenticate with the API. The certificate states that it authenticates a Common Name (CN), typically an email address. The CN must be registered with HathiTrust.org. Only applications presenting certificates for registered Common Names are authorized to retrieve restricted data. Authorization is negotiated out-of-band by contacting HathiTrust.org.
A client application that is authorized to display restricted data is required to have an authentication and authorization scheme in place to limit access trusted users. Restriction is based primarily on the the rights attribute.
Some repository items are open to download without restriction. There is a requirement to prevent systematic downloading of certain other items in the HathiTrust repository. For these items, a quota may be imposed to limit the download rate. Finally, some items may only be downloaded by authorized clients.
The value of the htd:access[@resource="aggregate|pageocr|pageimage"] element, for a given resource, indicates the restrictions on that resource.
htd:access value | Explanation |
|---|---|
http://schemas.hathitrust.org/htd/2009#restricted | authentication and authorization required |
http://schemas.hathitrust.org/htd/2009#limited | access to public domain items is open but may be rate-limited |
http://schemas.hathitrust.org/htd/2009#open | no restrictions |
Response Codes
This section lists the HTTP response codes and their meanings in the context of the API.
| Code | Explanation |
|---|---|
| 200 OK | No error. The request to retrieve the resource was successful. |
| 400 BAD REQUEST | Invalid request URI or HTTP header, or unsupported nonstandard parameter. |
| 401 UNAUTHORIZED | Authorization required. This will occur when the Common Name in the SSL Certificate has not been registered with HathiTrust.org. |
| 403 FORBIDDEN | SSL Certificate authentication failed. |
| 404 NOT FOUND | Resource identified by :ID or :ID/:SEQ not found. |
| 500 INTERNAL SERVER ERROR | Internal error. This is the default code that is used for all unrecognized errors. |
| 503 SERVICE UNAVAILABLE | Quota exceeded. |

