Reflections on the First HathiTrust Member Meeting

By Mike Furlough, Executive Director, HathiTrust

Since I started as Executive Director of HathiTrust in May of this year, I have done nothing but learn: learn about the organization, our operations, our finances, our people, and our partnership. I have traveled quite a bit, especially this fall, paying visits to HathiTrust members (thank you, libraries of Carnegie Mellon, Pittsburgh, Harvard, and Northwestern), several meetings of library organizations and consortia (thank you TRLN, GWLA, COPPUL, and ASERL), as well as a couple of special focus meetings on digital humanities and newspaper digitization (thanks to you all as well).

Although I usually give a talk about HathiTrust during these trips, I consider these listening, not proselytizing visits. I am there to learn about the turning points at which these organizations find themselves today and what strategic issues they are focusing on. And through their questions I find out what matters to them about HathiTrust—or what would matter to them more if we were to tackle this problem or that. It’s also useful to find out what people don’t know or don’t understand, because it sometimes means that users may not be getting the full benefit of HathiTrust.

The standout event of my first six months was the 2014 HathiTrust Members Meeting, held in Washington, DC on October 11. This was the first meeting of our membership since the 2011 Constitutional Convention, after which we developed our new governance structure, and adopted our current financial model. This was a unique chance to bring our partners together to update them on our current initiatives and engage them to begin planning for the future. Evident throughout the day were the membership’s strong sense of shared responsibility for the success of HathiTrust and the excitement for what we have done and will do together.

Here I’d like to offer some reflections on the day’s discussions, highlighting a few specific initiatives and some questions about where we are going as an organization. There’s no way in a short blog post to cover every single issue that came up that day, let alone in the last six months, so I hope you will forgive omissions for the sake of brevity and post questions or contact me directly. If you are interested, we have a more detailed report on the Member Meeting, along with slides of most of the presentations.

First of all, the partnership is strong and continues to grow. After the 2011 Convention our membership increased from 64 to 101 member libraries and now includes four in Canada, one in Spain, and one in Australia. Over 60 individuals from 30 member different libraries currently serve on a HathiTrust working group, standing committee, or governance committee. Our infrastructure is strong and we have repeatedly confirmed that our work is grounded solidly in the law. A growing number of our members have identified staff to work with us to obtain access to the HathiTrust collection for users who have print disabilities. The collection has grown significantly. At the moment I am writing this we stand at 12.96 million volumes, 4.8 million of which are open for full-text access because they are no longer covered by copyright or because an author or publisher has made the material available using a Creative Commons license. We have struck some outstanding partnerships in the last several years, including one with the Digital Public Library of America, which is now a notable source of viewers and readers of HathiTrust collections. In short, our preservation and access services provide a very solid basis for future work.

And that future work will continue to transform how libraries serve their users and manage collections. During its inaugural year the Program Steering Committee (PSC) launched working groups to plan programs passed as ballot initiatives at our 2011 Constitutional Convention. One of these, a proposal to develop a shared and distributed print monographs archive, will promote collective and coherent decisions about the retention and long-term management of print collections.  By organizing a distributed print collection corresponding to the HathiTrust digital collection, we can strengthen our preservation commitments and better ensure future access to the cultural record. The working group studying these issues will make their first recommendations in early 2015. The chair of this group, Tom Teper of the University of Illinois Urbana Champaign reported on their work at the Member Meeting.

We have already taken action on another proposal from the Convention, one to expand and enhance access to US federal government publications. In 2013 we began the development of a  Registry of US Federal Government Documents. More recently, the Government Documents Initiative Planning and Advisory Working Group, led by Mark Sandler of the Committee on Institutional Cooperation, has made preliminary recommendations that are now under review by the Program Steering Committee. Currently HathiTrust holds over 575,000 known US federal publications in HathiTrust, but we believe there to be a substantial number of unidentified documents in the collection, and a much larger number of documents left undigitized. The recommendations of the Advisory Working Group include several that will strengthen the Registry project, and others that will help us to identify, source, and collect federal documents over the next several years. Mark Sandler also provided a report at the Washington meeting.

Stephen Downie of the Graduate School of Library and Information Science at the University of Illinois, Urbana Champaign, reported on the HathiTrust Research Center. Our goal in supporting the Research Center is to simplify advanced computational access to our digital collection through services and infrastructure developed by experts. Downie, who along with Beth Plale from the School of Informatics and Computing at Indiana University co-directs the Research Center, outlined an ambitious agenda of service development, which will be furthered with substantial funding from HathiTrust and from both Illinois and Indiana. These plans include the development of training and services that can be integrated into services in a library’s research commons or in similarly-defined programs of advanced support for faculty and students. In addition to the development of these services, the Research Center has received funding for research from the Alfred P. Sloan Foundation, the Andrew W. Mellon Foundation, and the National Endowment for the Humanities. There is tremendous potential for the work undertaken by the Research Center to enable great improvements in the metadata and the content of the HathiTrust collections.  They have recently announced the date for their next “Uncamp” (March 30-31, 2015 in Ann Arbor, MI) and released a request for proposals from which they will select projects for advanced research support from HTRC staff.  (Researchers, including faculty and students, from HathiTrust member institutions have priority in this call). The RFP includes detailed information.

Because we have developed such a strong organization, collection, and infrastructure, we can readily address these challenges of print management, document identification, and services for computational research. Yet with all of this underway, we are still growing as an organization, and much of our discussion during the Member Meeting focused on how we can collectively chart HathiTrust's future paths. At the 2011 Convention, attendees referred a ballot measure to expand the mission of HathiTrust to the new Board of Governors for action. In Washington, board member Brian Schottlaender, presented a draft of new language for the Bylaws (Section I - Purpose), developed in response. The language proposed makes clear that HathiTrust should not be as format-bound as we have been in the past. The original bylaws state that we are building a “digital archive of library materials converted from the print collections of the member institutions.” In proposed revisions, our purpose would be to collect "digital content of value to scholars and researchers, including a variety of formats and born-digital materials.” There was general support for these edits, though some members asked for further clarification on other points. We are finalizing the new text and it will be presented to the members for a vote in the near future.

Assuming these changes to the bylaws are passed, we will have to think about what it means for HathiTrust to collect the record of human knowledge in “a variety of formats.” Obviously we must pursue partnerships with publishers and other organizations to collect newly published materials in born-digital format. We made a start with that by collecting newly published university press books made openly accessible through the Knowledge Unlatched pilot project. But this is only a start, and we must be ready to collect material from other sources. In this regard, the discussions around future funding for scholarly monographs remain very important to monitor.

In our first several years we did undertake a pilot project that collected images in HathiTrust, and had plans for a pilot for audio materials that we did not complete. During an open discussion period in Washington I asked “How important are non-text formats for HathiTrust?” and the responses varied.  No one disputed their importance, but some cautioned on the timing. For certain members they are critical. These members believe that we must better support visual and graphical materials, including those found in the books in our existing collection, as well as materials at-risk or otherwise less accessible in our archives and special collections. Some observed that as a body of materials, the government publications--on which we are so heavily focused--are and have always been multi-format. However, others cautioned that we still have much yet to do with the textual materials we’ve collected, and that there are other types of text collections we haven’t touched, such as newspapers. We should, in this view, not lose sight of what we do well and be mindful of the resources required to expand into new formats.

Making clear choices about what you not going to do can be powerful. Our success stems in part from our clarity and focus on text over the last six years.  We’ve now developed great capacity and expertise in managing re-formatted print/text collections, and I am a strong believer in playing to your strengths. Expanding beyond text might be seen to diminish that focus. Although new format choices implicate development and would affect our resource allocations, “What formats?” is not the only question. What are we trying to achieve, and what types of future access do we need to envision?

Of course, books do not exist in a vacuum, and even a text-bound collection must in the future be able to connect its materials with users regardless of their working environment. Works of fiction, poetry, and other creative genres found in HathiTrust can be related to letters, draft manuscripts and other materials in archives around the world. The long-form arguments embodied in monographs are dependent upon those of other books, as well as articles in serials, primary source documents, collections of data, and so on. Virtually anything can be evidence in a scholarly argument, and for two decades now we have seen many experiments in multi-modal scholarship that attempts to make these relationships between argument and evidence manifest and seamlessly available. In what way can we prepare our infrastructure to connect to, if not collect, those related materials? As our friends at OCLC research have observed, “evolving scholarly record” has become more heterogeneous and parts are at risk due to fragmentation in our mechanisms of management and preservation.  We will have to address this format question squarely in the coming year, but we will do so the context of our overall mission, the services we can build together, and related strategic issues. Earlier this year the Program Steering Committee began outlining some issues related to collecting non-text formats. This is only a start of the discussion, and this issue is also in the charge of the newly re-charged Collections Committee.

It’s a very different world now than when we began and clearly it’s time for longer-range planning at HathiTrust. In 2008 “mass digitization” was still less than four years old, opinion about its value was mixed, and its future was uncertain. That is the moment we came from, but as we start 2015 we have many new venues in which to work on these problems as a collective. These include, among others, DPLA, the Digital Preservation Network (DPN), and Academic Preservation Trust (APTrust). These initiatives and others can also transform discovery, preservation, and access to the diverse scholarly products of our researchers and students, especially if we are coordinating our strategies. When we began HathiTrust some commentators doubted that we could be successful, but our success has partially enabled such a flourishing ecosystem of digital library infrastructure. Precisely because of our success, HathiTrust has a special obligation to work with others to help bring “coherence” (to borrow a term) in this environment.  

Whatever we do, these issues need to be addressed from multiple perspectives and with the needs of the membership at the center of the discussion. At the Member Meeting, and in private conversations I have had, some representatives have urged that we undertake our future development initiatives in the most inclusive and transparent manner possible without interfering with our agility. These are important and natural concerns. Our governance structures, including the Board of Governors, the Program Steering Committee, and various working groups, are providing mechanisms for this. For example, the PSC will work on creating processes for identification and evaluation of proposals for major new technical or service developments. In our startup years we have drawn heavily on the resources of the University of Michigan Library. But in 2013 we launched Zephir, developed and operated by the University of California’s California Digital Library to manage metadata for the repository, and the HathiTrust Research Center is co-located at two of our member institutions. HathiTrust increasingly must stand up on its own and continue to draw upon the expertise of all of its members, enabling our libraries to build and offer their own services based on the HathiTrust collections and platform. Some attendees at the Member Meeting offered ideas aimed at making this possible, such as “microgrants” to fund investigations or research and development beyond the scope of the Research Center. Others expressed their hope to form a strong HathiTrust community, and want to see opportunities for member institutions to share programs or projects they’ve initiated based on the HathiTrust collection and services. These are great ideas to explore, and there are others found in the full report on the Member Meeting. I welcome others from you now and at any time. HathiTrust is partnership focused on sharing responsibility for preserving and curating our resources, and your involvement is necessary.  

