Phase 1: multiple identifiers for a single record
(Note that this builds on work done in Phase 0 [1])
Often, you'll have at your disposal several supposedly-unique identifiers; with the phase 1 code, you can send them all and get a set of scored records in the response.
We start with a query that could easily have come from an OPAC web page (see below for all possible input parameters).
http://mirlyn.lib.umich.edu/cgi-bin/sdrsmd?id=1&oclc=6861637&lccn=80024367&
isbn=0060404531&isbn=9780060404536
Here, we throw everything we know about this record -- oclc, lccn, and both the 10- and 13-character ISBNs -- at the srdsmd. What we get is a record with a score:
{
"error" : null,
"id" : "1",
"result" : {
"1" : [
{
"oclc" : [
"6861637"
],
"lccn" : [
"80024367"
],
"sdr" : {
"rights" : "searchonly",
"handle" : "mdp.39015000000482",
"mburl" : "http://hdl.handle.net/2027/mdp.39015000000482"
},
"isbn" : [
"0060404531",
"9780060404536"
],
"score" : 225,
"matchPercentage" : 100,
"matchedItems" : 4
}
]
}
}
In addition to all the other information we know and love from Phase 0 [1], we get three more items:
What if we got the wrong lccn? And, by some really bad luck, it's actually a valid lccn in the system?
{
"error" : null,
"id" : "1",
"result" : {
"1" : [
{
"oclc" : [
"6861637"
],
"lccn" : [
"80024367"
],
"sdr" : {
"rights" : "searchonly",
"handle" : "mdp.39015000000482",
"mburl" : "http://hdl.handle.net/2027/mdp.39015000000482"
},
"isbn" : [
"0060404531",
"9780060404536"
],
"score" : 150,
"matchedItems" : 3,
"matchPercentage" : "75"
},
{
"oclc" : [
"4667523"
],
"lccn" : [
"77906307"
],
"sdr" : {
"rights" : "searchonly",
"handle" : "mdp.39015000000490",
"mburl" : "http://hdl.handle.net/2027/mdp.39015000000490"
},
"score" : 75,
"matchedItems" : 1,
"matchPercentage" : "25"
}
]
}
}
Here we get two records, pre-sorted by the server based on score, then (if necessary) by matchedItems.
The first has both a higher score and a higher number (and thus percentage) of matched items, and is therefore considered by the serve to be the "best" match. The second has one good match -- lccn -- and is included just in case.
It's up to the client to determine what threshold should represent the "worst still-usable" data. The server will always return all matches.
The scoring process is essentially completely arbitrary at this point -- any feedback would be much appreciated.
| Index | Score | Example | Description |
| handle | 100 | mdp.39015000000482 | MDP Handle |
| oclc | 100 | 4667523 | OCLC Number |
| sdr | 100 | wu1000063 | SDR Member organization submitted code |
| lccn | 75 | 80024367 | Library of Congress Control Number |
| isbn | 25 | 0060404531 | 10 or 13 character ISBN |
| issn | 25 | 10000453 | ISSN |
Links:
[1] http://www.hathitrust.org/phase0