Shared Digital Repository
Update on May 2008 Activities
13 June 2008
This is the third regular update on activities in the Shared Digital Repository (SDR). These updates will be made available monthly, typically on the 2nd Friday of the month, and will provide a variety of information about the general health of the repository and updates on the development of the SDR. Each update will be sent via e-mail to the Library Director and CIO at each participating institution, and will be posted on the SDR website. We plan to make an RSS feed for the updates available soon, in order to share the information as broadly as possible.
Throughout this update, we refer to the draft Short-Term and Long-Term Functional Objectives (being articulated by the CIC’s SDR committee) as a work item relates to those Objectives.
- A final form of an agreement between the CIC and the Repository Administrators was concluded and signed.
- The executive management committee of the SDR meets monthly and continues to work on a variety of issues ranging from SDR finances to development priorities. We are working to establish the first meeting of the Operational Advisory Board, which includes representation from the CIC.
- We continue to have productive conversations with other several other institutions about possible participation in the SDR and hope to provide information on our progress in this regard in future Updates.
Growth of the SDR
As of June 2nd, the SDR contains:
- 1,171,809 volumes
- 840,007 titles
- Approximately 410 million pages
- 219,908 individual volumes in the public domain (approximately 19% of total)
In the coming weeks, we will release a draft response to the required elements in the “Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist.” As mentioned in an earlier update, we coordinated a site visit by a team from the Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) effort in the European Union. Their report, which gives an extremely favorable review of the SDR, should be released publicly soon. (CIC SDR Short-Term Functional Objectives)
- Basic hardware deployment: Staff members at the University of Michigan and Indiana University have exchanged detailed specifications for deploying the redundant site at IU and have established a monthly conference call for coordination. IU has assigned staff to the project and is now working locally on site preparations. We expect to deploy equipment in two stages: in early summer, a rack of new servers; and in early fall, the second instance of storage (now online at UM ready to be populated with content).
- Deployment issues: The bundling of page images and text files into a single file per volume is 95% complete as of the writing of this report, and will complete within the month of June. We will then begin testing replication of content from the first instance of storage to the second.
Ingesting Wisconsin content: The test ingests conducted in late April were mostly successful; some small bugs were identified and corrected in May. The first batch of Wisconsin bibliographic records was loaded, and internal processes that require bibliographic records (primarily link creation and rights generation) were tested. Routine processes for loading Wisconsin bibliographic records will begin running in June, and once in place, routine ingest of page images will follow. Additionally, we will double the number of ingest servers in June to increase the overall ingest rate.
- Large-scale search: The production Solr configuration is in place and is being tested in conjunction with the new Collection Builder system (see below). We hope to begin disseminating some numbers and approaches to benchmarking performance on large bodies of text soon. (CIC SDR Long-Term Functional Objectives)
- Institution-specific pageturner: A version of the institution-specific pageturner to support some branding is now in production at Wisconsin. Screenshots, which compare the Michigan and Wisconsin views, are attached. We will soon deliver XSL and CSS to Wisconsin so that they can begin modifying the display. (CIC SDR Short-Term Functional Objectives)
- Services for visually-disabled users: We have released the new interface for visually impaired users (optimized for use with JAWS and other screen readers), which presents the entire text version with navigation to the user on one screen. We are currently working with two School of Information interns this summer to optimize this interface for use with screen readers, as well as the general accessibility of the pageturner. (CIC SDR Short-Term Functional Objectives)
- Fedora programmer: We continue to search for a programmer to aid in implementing Fedora in conjunction with the SDR. The position will soon be reposted as a more general system engineering job description, with testing Fedora to be the first project focus in this position. (CIC SDR Long-Term Functional Objectives)
- Collection Builder: We have completed substantial work on the creation of a Collection Builder, which should allow users and staff to “publish virtual collections” (CIC SDR Short-Term Functional Objectives). We plan to release the Collection Builder in production the week of June 16th and plan to conduct testing during the last two weeks of June.
- 10-page PDF chunks: This functionality is now in production.
- API development: Chicago, Wisconsin, and Northwestern have all expressed interest in testing an API that provides bibliographic, access, and rights information for SDR materials on request (similar to the functionality of the Google API). They are currently evaluating Michigan's internal API and will be providing input on additions/changes to functionality that can be incorporated in a version that is tuned for SDR participants.
- Distributing bibliographic information about the contents of the SDR: Bibliographic information about initial Wisconsin materials ingested to the SDR is now available as part of the MBooks OAI set. Note, however, that this OAI set contains only information related to public domain materials. We continue to work toward a more effective means of disseminating comprehensive information about content in the SDR.
Forecasting June development
- double infrastructure for ingesting content;
- initiate routine ingest of Wisconsin materials;
- release Collection Builder;
- make substantial progress on the development of a comprehensive distribution mechanism for bibliographic data about all SDR materials, public domain and restricted;
- release a draft response to the required elements in the “Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist.”
Status/availability of the SDR
We schedule system maintenance work that requires a system outage during time windows (in Eastern time) where academic user activity is generally lowest:
- For major work, Friday evenings (8pm-1am) and Sunday mornings (5am-10am);
- For minor work, weekdays from 6:30am-8am.
Advance notice for scheduled outages is given on business days and at least 24 hours in advance. Notice of unscheduled outages is given upon discovery, and additional updates are given as appropriate.
Please contact Phyllis White (pmwhite at umich.edu) with email addresses of individuals or groups that should be added to our system outage mailing list.
There were no interruptions in service in May.
At this time, the following outages are scheduled:
- June: The brief outage originally anticipated for May (for a storage system software upgrade) will be scheduled for a date in June.
- July: No outages are planned at this time.