Phase1
Phase 1: multiple identifiers for a single record
(Note that this builds on work done in Phase 0)
Phase 1: Multiple Identifiers for a single record
Often, you'll have at your disposal several supposedly-unique identifiers; with the phase 1 code, you can send them all and get a set of scored records in the response.
Phase 1 Input Example
We start with a query that could easily have come from an OPAC web page (see below for all possible input parameters).
http://mirlyn.lib.umich.edu/cgi-bin/sdrsmd?id=1&oclc=6861637&lccn=80024367&
isbn=0060404531&isbn=9780060404536
Here, we throw everything we know about this record -- oclc, lccn, and both the 10- and 13-character ISBNs -- at the srdsmd. What we get is a record with a score:
{
"error" : null,
"id" : "1",
"result" : {
"1" : [
{
"oclc" : [
"6861637"
],
"lccn" : [
"80024367"
],
"sdr" : {
"rights" : "searchonly",
"handle" : "mdp.39015000000482",
"mburl" : "http://hdl.handle.net/2027/mdp.39015000000482"
},
"isbn" : [
"0060404531",
"9780060404536"
],
"score" : 225,
"matchPercentage" : 100,
"matchedItems" : 4
}
]
}
}
In addition to all the other information we know and love from Phase 0, we get three more items:
- score is the total score, as explained below in the Scoring section.
- matchedItems is the total number of items matched (in this case, one oclc number, one lccn, and two isbn's).
- matchPercentage notes how many of the data you sent match this record -- in this case, a perfect 4/4 for a percentage of 100.
Phase 1 Input -- contradictory input data
What if we got the wrong lccn? And, by some really bad luck, it's actually a valid lccn in the system?
{
"error" : null,
"id" : "1",
"result" : {
"1" : [
{
"oclc" : [
"6861637"
],
"lccn" : [
"80024367"
],
"sdr" : {
"rights" : "searchonly",
"handle" : "mdp.39015000000482",
"mburl" : "http://hdl.handle.net/2027/mdp.39015000000482"
},
"isbn" : [
"0060404531",
"9780060404536"
],
"score" : 150,
"matchedItems" : 3,
"matchPercentage" : "75"
},
{
"oclc" : [
"4667523"
],
"lccn" : [
"77906307"
],
"sdr" : {
"rights" : "searchonly",
"handle" : "mdp.39015000000490",
"mburl" : "http://hdl.handle.net/2027/mdp.39015000000490"
},
"score" : 75,
"matchedItems" : 1,
"matchPercentage" : "25"
}
]
}
}
Here we get two records, pre-sorted by the server based on score, then (if necessary) by matchedItems.
The first has both a higher score and a higher number (and thus percentage) of matched items, and is therefore considered by the serve to be the "best" match. The second has one good match -- lccn -- and is included just in case.
It's up to the client to determine what threshold should represent the "worst still-usable" data. The server will always return all matches.
Input values and Scoring
The scoring process is essentially completely arbitrary at this point -- any feedback would be much appreciated.
| Index | Score | Example | Description |
| handle | 100 | mdp.39015000000482 | MDP Handle |
| oclc | 100 | 4667523 | OCLC Number |
| sdr | 100 | wu1000063 | SDR Member organization submitted code |
| lccn | 75 | 80024367 | Library of Congress Control Number |
| isbn | 25 | 0060404531 | 10 or 13 character ISBN |
| issn | 25 | 10000453 | ISSN |