Update on March 2008 Activities
Shared Digital Repository
March 2008 Update
11 April 2008
This is the first regular update on activities in the Shared Digital Repository (SDR). These updates will be made available monthly, typically on the 2nd Friday of the month, and will provide a variety of information about the general health of the repository and updates on the development of the SDR. Each update will be sent via e-mail to an official representative (typically the library director) of a participating institution, and will be posted on the SDR website. We plan to make an RSS feed for the updates available soon, in order to share the information as broadly as possible.
- A final form of an agreement was concluded with the CIC. The text of the agreement itself was provided to the CIC library directors from the CIC Center for Library Initiatives. We are currently collecting signatures at the University of Michigan and the CIC.
- The executive management committee of the SDR, consisting of the two library deans and CIOs from Indiana University and the University of Michigan, as well as John Wilkin, met on April 10th to discuss a variety of issues ranging from SDR finances to development priorities. The group will meet monthly and will focus primarily on broad future directions for the SDR. A date is being sought for a meeting of the SDR’s Operational Advisory Board, during which we expect to address a variety of issues, including coordination between the SDR and the CIC.
- We have begun conversations with several institutions outside of the CIC about their possible participation in the SDR. We hope to provide information on our progress in this regard in future Updates.
Growth of the SDR
As of April 11th, the SDR contains:
- 1,102,090 volumes
- 781,934 titles
- approximately 385 million pages
- 209,073 individual volumes in the public domain (18.9% of total)
No certification process currently exists to ascertain a digital repository’s fitness for long-term curatorial responsibility. We are, nevertheless, hard at work on ensuring a high degree of transparency about the SDR’s compliance on issues related to archiving responsibility. Content that we ingest is intensively reviewed to ensure that it is valid and has not been affected by transmission; we are working to develop regular routines that re-validate using stored checksums. We have also undertaken efforts to communicate our readiness or fitness for long-term archiving responsibility. First, we have completed a draft response to the required elements in the Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist, and will post a preliminary version of our response on the SDR website in the relatively near future. Second, we coordinated a site visit by a team from the Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) effort in the European Union, and they will make their report, which provides an extremely favorable review of the SDR, public soon.
- Basic hardware deployment: Hardware infrastructure for the SDR is currently in place at the University of Michigan and is now being used to ingest content into the SDR and deliver content to users. We have also already purchased a second instance of the storage system, and once testing with data synchronization is complete, plans will be developed to move that system to Indiana University and to build out a second instance of servers there. Once the second instance is moved to Bloomington we will employ load balancing and failover to ensure greater availability. To users, this will appear to be a single site in a single location. Finally, we should note that a tape back-up of files is written to a third location, providing an important additional level of reliability for the content.
- Deployment issues: A number of important issues were explored and addressed in the release of the new hardware. One significant issue we encountered in deploying the content was the high overhead of the storage system’s filesystem-based redundancy, given the very large number of small files, resulting in a doubling (!) of storage consumption. By adapting the repository structure to a model that uses a single package per volume (we are currently running processes to bundle all of the page images and text files for each volume into a single archive file), we will eliminate the impact of that overhead. At the same time, we are leveraging this repository-wide processing to incorporate important PREMIS preservation metadata in the METS files that document the individual volumes, strengthening the preservation orientation of the archive.
- Institution-specific pageturner: We have deployed an early-release version of an institution-specific pageturner for viewing content in the SDR. That is, persons coming from University of Michigan IP addresses read books in an MBooks interface with Michigan’s colors and logo, while persons coming from a University of Wisconsin IP address see an interface whose features and logos are controlled by staff at Wisconsin. Many issues remain to be worked through, but we are currently testing the mechanisms with Wisconsin and will begin broader roll-out and testing in May.
- Services for visually-disabled users: Michigan has deployed a mechanism that allows students certified by its Office of Services for Students with Disabilities to stream the text of in-copyright works to a screen reader. This is an early release of these mechanisms and we are not yet able to extend the model to certified users at other institutions, but we are pleased with the design, which was conceptualized in collaboration between the University of Michigan library, General Counsel, Office of Services for Students with Disabilities, and the National Federation of the Blind. We will work with the Operational Advisory Board to plan a strategy to make the mechanisms available to users at other institutions so that those users may read public domain works. We will work with the Operational Advisory Board to explore the question of access to in-copyright works.
- Other developments: We have deployed development versions of multi-page PDF rendering, and a collection builder (i.e., a mechanism that allows a user to bring together and search across a defined body of materials). We are exploring large-scale full-text indexing using SOLR, and are trying to collect meaningful benchmarking data for different sizes of aggregated data. We are currently working with Wisconsin to define and test initial bibliographic ingest mechanisms. We will also soon release a mechanism for distributing bibliographic information about the contents of the SDR to participating libraries so that they may enhance or add records to their catalogs. In future updates, we will provide more detailed information on each of these developments.
Status/availability of the SDR
Each month in this update, we will provide information on planned outages so that scheduled activities (e.g., classes and presentations) can work around these times. When it is necessary to interrupt availability of the SDR, we will schedule:
- major work on Friday evenings (8pm-1am) and Sunday mornings (5am-10am);
- minor work will be scheduled on weekdays from 6:30am-8am (times in Eastern time).
We will collect email addresses for people who should receive advance notification of planned outages as well as.
- There are no planned outages for April and May.
- There were no interruptions in service in March.