Navigation

HathiTrust Rights API Phase 1

Phase1

Phase 1: multiple identifiers for a single record

(Note that this builds on work done in Phase 0)

Phase 1: Multiple Identifiers for a single record

Often, you'll have at your disposal several supposedly-unique identifiers; with the phase 1 code, you can send them all and get a set of scored records in the response.

Phase 1 Input Example

We start with a query that could easily have come from an OPAC web page (see below for all possible input parameters).

 http://mirlyn.lib.umich.edu/cgi-bin/sdrsmd?id=1&oclc=6861637&lccn=80024367&
isbn=0060404531&isbn=9780060404536

Here, we throw everything we know about this record -- oclc, lccn, and both the 10- and 13-character ISBNs -- at the srdsmd. What we get is a record with a score:

{
   "error" : null,
   "id" : "1",
   "result" : {
      "1" : [
         {
            "oclc" : [
               "6861637"

            ],
            "lccn" : [
               "80024367"
            ],
            "sdr" : {
               "rights" : "searchonly",
               "handle" : "mdp.39015000000482",
               "mburl" : "http://hdl.handle.net/2027/mdp.39015000000482"

            },
            "isbn" : [
               "0060404531",
               "9780060404536"
            ],
            "score" : 225,
            "matchPercentage" : 100,
            "matchedItems" : 4
         }
      ]
   }
}

In addition to all the other information we know and love from Phase 0, we get three more items:

  • score is the total score, as explained below in the Scoring section.
  • matchedItems is the total number of items matched (in this case, one oclc number, one lccn, and two isbn's).
  • matchPercentage notes how many of the data you sent match this record -- in this case, a perfect 4/4 for a percentage of 100.

Phase 1 Input -- contradictory input data

What if we got the wrong lccn? And, by some really bad luck, it's actually a valid lccn in the system?

  

{
   "error" : null,
   "id" : "1",
   "result" : {
      "1" : [
         {
            "oclc" : [
               "6861637"

            ],
            "lccn" : [
               "80024367"
            ],
            "sdr" : {
               "rights" : "searchonly",
               "handle" : "mdp.39015000000482",
               "mburl" : "http://hdl.handle.net/2027/mdp.39015000000482"

            },
            "isbn" : [
               "0060404531",
               "9780060404536"
            ],
            "score" : 150,
            "matchedItems" : 3,
            "matchPercentage" : "75"

         },
         {
            "oclc" : [
               "4667523"
            ],
            "lccn" : [
               "77906307"
            ],
            "sdr" : {
               "rights" : "searchonly",
               "handle" : "mdp.39015000000490",
               "mburl" : "http://hdl.handle.net/2027/mdp.39015000000490"

            },
            "score" : 75,
            "matchedItems" : 1,
            "matchPercentage" : "25"
         }
      ]
   }
}  
  

Here we get two records, pre-sorted by the server based on score, then (if necessary) by matchedItems.

The first has both a higher score and a higher number (and thus percentage) of matched items, and is therefore considered by the serve to be the "best" match. The second has one good match -- lccn -- and is included just in case.

It's up to the client to determine what threshold should represent the "worst still-usable" data. The server will always return all matches.

Input values and Scoring

The scoring process is essentially completely arbitrary at this point -- any feedback would be much appreciated.

Index Score Example Description
handle 100 mdp.39015000000482 MDP Handle
oclc 100 4667523 OCLC Number
sdr 100 wu1000063 SDR Member organization submitted code
lccn 75 80024367 Library of Congress Control Number
isbn 25 0060404531 10 or 13 character ISBN
issn 25 10000453 ISSN