The repository must store and track rights information for each digitized volume in HathiTrust that are used by mechanisms such as the page-turner access system. Some of the challenges in doing this are
a) modeling the rights information properly to ease maintenance,
b) ensuring accuracy in the semantics of rights, and
c) tracking the changes to rights information over time.
Frequent updates to millions of records, changes to database structure that make the database unavailable for long periods, or subtle changes over time in the meaning of millions of access control rules are some of the consequences of a poor design. Complicating these challenges is the need for flexibility to accommodate different types of rights information and to develop new access rules, including those that come as a result of negotiations with publishers and manual copyright clearance.
Copyright is complex, and although there are good efforts in modeling and expressing copyright status information for library holdings, this project requires a practical and flexible approach. At best, any solution will be imprecise. We should expect this imprecision and hope to improve on the situation over time. The safest approach is, as much as possible, to base the rights database on simple, established copyright policy and terminology that is not likely to change. Attributes in the database should be deliberately defined in ways that are consistent with copyright policy. Using cataloging metadata, we can make a basic determination of rights by, for example, characterizing the publications as being either in the public domain or in copyright, being in-copyright but out-of-print and brittle, or being authoritatively copyright-orphaned. Unfortunately, the exceptions to these general principles do not necessarily follow any sort of rule or pattern.
The MARC record format does not have fields intended for the storage of rights information and is not able to store this or similar information at the level of the volume (e.g., for multi-volume works). We are consistent with our colleagues at other institutions in recommending that this information be stored in a separate, large-scale database.
Our strategy for storing rights information is two-pronged, based on the notion of two extensible sets of attributes: The first set of attributes characterizes the copyright status of the volume. Examples of this type of attribute are "public domain," "public domain when viewed in the U.S." and "in-copyright"; each attribute is only present when appropriate. The main benefits of this approach are
a) insulation from frequent change and
b) accuracy in legal terms.
The bulk of this rights information is bibliographically-derived (by automated query) at the point of ingest using the relatively stable criteria of US federal government documents, country of publication and publication date. Over time, additional attributes will be defined and added as we identify out-of-print books, orphaned-copyright works, etc. The second set of attributes does not characterize the volume in terms of copyright status, and instead directly specifies access control rules. These can be thought of as "overrides" to the first, more general set of attributes. An obvious example of this type of information is "available to UM affiliates". For accurate representation, or due to changes in copyright status over time, some volumes may have more than one rights attribute. However, to simplify access decisions, rights attributes should be defined so that the most recent rights attribute is authoritative. For example, a volume initially classified as public domain is discovered to be in-copyright, but explicit access to UM affiliates has been granted by the copyright holder. In such a case, three attributes apply to the volume: its original public domain classification (as valuable history), its current status as in-copyright, and the explicit access control granted by the copyright holder. The latter takes precedence; as mentioned above, explicit access controls can be thought of as "overrides" to the more general copyright status attributes.
At the core of the rights system is an algorithm that considers a) the copyright status and/or explicit access controls associated with the volume, b) the volume's digitizing agent (e.g., Google or the University of Chicago), and c) the identity of the user (if known) in order to determine access rights. The access rights may differ based on any of these criteria. Because most rights attributes will be static and will characterize the copyright status of the volume in general terms (e.g. "out-of-print and brittle"), the decision matrix underlying this algorithm can easily accommodate changes in rights over time. For example, we launched a service that prohibited access to in-copyright materials, even when those volumes were out-of-print and damaged, but over time, a change in policy granted in-library users with access to those materials by virtue of Section 108 provisions in US copyright law. A change in access due to such a policy change will only require a simple change in the decision matrix. Some volumes will have a series of rights attributes applied over time. For these volumes, the most recent rights attribute will be used to determine access rights.
As a volume's images are ingested, they are placed in storage, and the retrieval system sends the identifier to Mirlyn for the item record to be created or updated. Mirlyn then performs a simple test for copyright status: (1) was the volume published in the US or outside the US? (2) depending on where it was published, was it published before a known cutoff date? and, if published in the US, is the volume a US federal government publication? The appropriate attributes are then stored in the rights database. With the volume in storage and represented in the rights database, it is available via the access system. When an action is requested in the access system, the access system consults the rights database. Based on the most recent rights attribute, the source, and whether the user has authenticated, a list of allowable actions is composed. The access system either performs the requested action, prompts the user for authentication, or denies the action.
The following simplified, hypothetical rights examples help illustrate both the attributes applied and the rules interpreted:
These are treated as public domain (ATTRIBUTE name=pd) based on bibliographically-derived information (REASON name=bib). We do not restrict access to these materials. b) Those texts that do not meet these criteria (e.g,. US post-1923 and not a government document) are treated as in-copyright (i.e., ATTRIBUTE name=ic and REASON name=bib). c) An additional attribute is used to represent works published outside the United States from 1870 to 1923 because copyright status for these works depends on the location of the user. Works published outside the US prior to 1923 are in the public domain; however, due to the variations in copyright law in countries outside the US, it is estimated that 1870 is the earliest date works published in these countries may still be under copyright. Therefore, users accessing the volume from US IP addresses will have access to the works published outside the US between 1870 through 1923; however, users with non-US IP addresses will not (ATTRIBUTE name=pdus and REASON name=bib).
Over time, the rights status of any volume may be redetermined by a number of methods. For example, updates to the bibliographic record or manual copyright determination processes may change the rights status of a volume.
Some determinations are more authoritative than others. Updates to rights data must take that into account and enforce precedence so that the most recent, most authoritative determination is in effect.
We have identified four levels of precedence in the rights model we are currently using. These are, in order of increasing authority, as follows:
| precedence | rights type | reason code | examples |
| 1 (lowest) | copyright | bib | pd/bib, ic/bib, und/bib, pdus/bib |
| 2 | copyright | any but bib and man | ic/unp, pd/ncn, pd/ren, pdus/gfv, ic/ren, und/nfi, pd/cdpp, ic/cdpp, pdus/cdpp, ic/add, pdus/add, pd/add, pd/exp, op/ipma, ic/ipma, und/ipma, ic/crms, pd/crms, und/crms, icus/gatt |
| 3 | any | pvt, con | ic-world/con nobody/pvt cc-by/con cc-by-nd/con cc-by-nc/con cc-by-sa/con cc-by-nc-nd/con cc-by-nc-sa/con cc-zero/con und-world/con orph/ddd orphcand/ddd |
| 4 (highest) | any | man |
pd/man pdus/man ic-world/man und-world/man ic/man nobody/man nobody/del (note: these are never allowed in automatic rights updates, and should have a corresponding explanation in the 'note' field) |
In this model, rights of a given precedence should be superseded only by rights of an equal or greater precedence. For example:
With one exception, the behavior of rights update processes should be to insert new rights information for a given volume when the newly-supplied rights status is of equal or greater precedence to the active (latest) rights status for that volume.
The exception to this rule is as follows: because access controls have precedence over other rights attributes, they must always remain in effect. However, we will not discard manually-determined rights information for those volume because, for example, we plan to eventually strip some access controls (such as those which are blocking volumes where private information is visible in images). Our system does not currently handle this very gracefully; the workaround is that when new rights data for such volumes is inserted, the access control should immediately be given a later timestamp so that it remains in effect. This will allow us to, in the near future, remove the access control and the most recent rights determination will take effect.
Note that rights may be supplied and re-supplied from manual copyright determination processes. According to these rules, those rights would all fall within the same level of precedence, and so would continue to take precedence over each other. It is assumed that a more recent determination is a more accurate determination.
Manual access controls will be ignored by rights updates and must be, currently, entered manually by an administrator.
In addition to building the model, the following requirements have been identified to ensure proper enforcement of rights:
| namespace | id | attr | reason | source | user | time | note |
| mdp | 39015054477651 | 1 | 1 | 1 | root | 2006-01-12 11:34:26 | |
| mdp | 39015017678577 | 1 | 1 | 1 | root | 2006-01-12 11:34:27 | |
| mdp | 39015017678577 | 4 | 4 | 1 | sooty | 2006-02-08 15:18:24 | determined by jaheim as in-copyright, but orphaned |
| mdp | 39015034781842 | 2 | 1 | 1 | root | 2006-01-12 11:34:28 | |
| mdp | 39015034781842 | 7 | 3 | 1 | pwillett | 2006-03-08 09:12:45 | agreement reached with publisher for open access |
| namespace | id | attr | reason | source | user | time | note |
| mdp | 39015054477651 | 1 | 1 | 1 | root | 2006-01-12 11:34:26 | |
| mdp | 39015017678577 | 4 | 4 | 1 | sooty | 2006-02-08 15:18:24 | determined by jaheim as in-copyright, but orphaned |
| mdp | 39015034781842 | 7 | 3 | 1 | pwillett | 2006-03-08 09:12:45 | agreement reached with publisher for open access |
| id | name | type | dscr |
| 1 | pd | copyright | public domain |
| 2 | ic | copyright | in-copyright |
| 3 | op | copyright | out-of-print (implies in-copyright) |
| 4 | orph | copyright | copyright-orphaned (implies in-copyright) |
| 5 | und | copyright | undetermined copyright status |
| 6 | umall | access | available to UM affiliates and walk-in patrons (all campuses) |
| 7 | ic-world | access | in-copyright and permitted as world viewable by the copyright holder |
| 8 | nobody | access | available to nobody; blocked for all users |
| 9 | pdus | copyright | public domain only when viewed in the US |
| 10 | cc-by | copyright | Creative Commons Attribution license |
| 11 | cc-by-nd | copyright | Creative Commons Attribution-NoDerivatives license |
| 12 | cc-by-nc-nd | copyright | Creative Commons Attribution-NonCommercial-NoDerivatives license |
| 13 | cc-by-nc | copyright | Creative Commons Attribution-NonCommercial license |
| 14 | cc-by-nc-sa | copyright | Creative Commons Attribution-NonCommercial-ShareAlike license |
| 15 | cc-by-sa | copyright | Creative Commons Attribution-ShareAlike license |
| 16 | orphcand | copyright | orphan candidate - in 90-day holding period (implies in-copyright) |
| 17 | cc-zero | copyright | Creative Commons Zero license (implies pd) |
| 18 | und-world | access | undetermined copyright status and permitted as world viewable by the depositor |
| 19 | icus | copyright | in copyright in the US |
| id | name | dscr |
| 1 | bib | bibliographically-derived by automatic processes |
| 2 | ncn | no printed copyright notice |
| 3 | con | contractual agreement with copyright holder on file |
| 4 | ddd | due diligence documentation on file |
| 5 | man | manual access control override; see note for details |
| 6 | pvt | private personal information visible |
| 7 | ren | copyright renewal research was conducted |
| 8 | nfi | needs further investigation (copyright research partially complete; an ambiguous, unclear, or other time-consuming situation was encountered) |
| 9 | cdpp | title page or verso contain copyright date and/or place of publication information not in bib record |
| 10 | ipma | in-print and market availability research was conducted |
| 11 | unp | unpublished work |
| 12 | gfv | Google viewability set at VIEW_FULL |
| 13 | crms | derived from multiple reviews in the Copyright Review Management System (CRMS) via an internal resolution policy; consult CRMS records for details |
| 14 | add | author death date research was conducted or notification was received from authoritative source |
| 15 | exp | expiration of copyright term for non-US work with corporate author |
| 16 | del | deleted from the repository; see note for details |
| 17 | gatt | non-US public domain work restored to in-copyright in the US by GATT |
| id | name | dscr |
| 1 | ||
| 2 | lit-dlps-dc | Library IT, Digital Library Production Service, Digital Conversion |
| 3 | ump | University of Michigan Press |
| 4 | ia | Internet Archive |
| 5 | yale | Yale University |
| 6 | umn | University of Minnesota |
| 7 | mhs | Minnesota Historical Society |
| 8 | usup | Utah State University Press |
| 9 | ucm | Universidad Complutense de Madrid |
| 10 | purd | Purdue University |
| 11 | getty | Getty Research Institute |
| 12 | um-dc-mp | University of Michigan, Duderstadt Center, Millennium Project |
| 13 | uiuc | University of Illinois at Urbana-Champaign |
| 14 | brooklynmuseum | Brooklyn Museum |
CREATE TABLE rights_log (
namespace VARCHAR(8) NOT NULL,
id VARCHAR(32) NOT NULL,
attr TINYINT NOT NULL,
reason TINYINT NOT NULL,
source TINYINT NOT NULL,
user VARCHAR(32) NOT NULL,
time TIMESTAMP NOT NULL default CURRENT_TIMESTAMP,
note TEXT,
PRIMARY KEY (namespace, id, time)
);
CREATE TABLE rights_current (
namespace VARCHAR(8) NOT NULL,
id VARCHAR(32) NOT NULL,
attr TINYINT NOT NULL,
reason TINYINT NOT NULL,
source TINYINT NOT NULL,
user VARCHAR(32) NOT NULL,
time TIMESTAMP NOT NULL default CURRENT_TIMESTAMP,
note TEXT,
PRIMARY KEY (namespace, id)
);
CREATE TRIGGER ins_rights ON INSERT ON rights_current FOR EACH ROW insert into rights_log values(new.namespace, new.id, new.attr, new.reason, new.source, new.user, new.time, new.note)
CREATE TRIGGER upd_rights ON UPDATE ON rights_current FOR EACH ROW insert into rights_log values(new.namespace, new.id, new.attr, new.erason, new.source, new.user, new.time, new.note)
CREATE TABLE attributes (
id TINYINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE KEY,
type ENUM('access','copyright') NOT NULL,
name VARCHAR(16) NOT NULL,
dscr TEXT NOT NULL);
CREATE TABLE reasons (
id TINYINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE KEY,
name VARCHAR(16) NOT NULL,
dscr TEXT NOT NULL);
CREATE TABLE sources (
id TINYINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE KEY,
name VARCHAR(16) NOT NULL,
dscr TEXT NOT NULL);
Links:
[1] http://www.hathitrust.org/rights_database%2523Introduction
[2] http://www.hathitrust.org/rights_database%2523StorageandMaintenanceStrategy
[3] http://www.hathitrust.org/rights_database%2523RightsAssignment
[4] http://www.hathitrust.org/rights_database%2523PrecedenceofRightsInformation
[5] http://www.hathitrust.org/rights_database%2523SecurityConsiderations
[6] http://www.hathitrust.org/rights_database%2523DatabaseLayout
[7] http://www.hathitrust.org/rights_database%2523top
[8] http://www.hathitrust.org/print_holdings