Available Indexes

HathiTrust 2014 Member Meeting Notes

HathITrust 2014 Member Meeting
Hotel Palomar, Washington, D.C.
October 10, 2014

9:00 - Welcome and Introductions

Sarah Michalak, University of North Carolina, Chair, HathiTrust Board of Governors

Michalak welcomed attendees and began the meeting by reflecting on her experiences as a member of the HathiTrust Board and Executive Committee. Her role on these bodies afforded her a view of the daily experiences and challenges of HathiTrust, from issues of copyright and application development to the lawsuit, exposing how parts of the membership work together as a whole to operate one of the most magnificent resources ever created. Michalak encouraged attendees to think about the national and international of impact of HathiTrust, and introduced the meeting as the first of many opportunities for members to steward and advance HathiTrust.

9:15 - Executive Director Report 

Mike Furlough, Executive Director, HathiTrust [PDF]

Furlough began his remarks by discussing the strength of HathiTrust’s position as an organization and the solid base the membership provides for future action, having already impacted norms in library behavior. After briefly reviewing the history and structure of HathiTrust, he spoke about the shared responsibilities and goals of HathiTrust, the need to help different members participate in the organization, and HathiTrust’s mission to serve both members and the public good. Furlough identified several areas related to HathiTrust’s strategy, mission, and future role where further consideration is needed. These included membership growth, development of HathiTrust’s collections program, involvement in issues of public policy, participation in national and international digital infrastructure, and the development of services for members and the public. He highlighted areas of consideration related to HathiTrust as an organization as well: deepening engagement with researchers, libraries, and member institutions, and thinking about the structures we put in place so that HathiTrust can “stand on its own” as an organization. Furlough concluded by outlining several assumptions about HathiTrust’s work going forward, including alignment with member goals, working collaboratively, scale as a driving force for activities, and the broad nature of HathiTrust’s community, encompassing libraries, publishers, and authors.

Questions & Comments:

Stephen Downie (University of Illinois at Urbana-Champaign) asked what the biggest impediment to European libraries joining HathiTrust might be.

Furlough responded that the main impediment so far has been a lack of time to undertake systematic outreach.  Adding more European members will require thought and discussion with the potential members about the services that would benefit them most. For instance, US members benefit from exemptions from the copyright law that may not be available to European institutions.

Zheng (John) Wang (University of Notre Dame) asked if HathiTrust has thought about growing membership in Asia.

Furlough responded that we have not focused on this yet, but we do have a relationship with Keio University and have ingested 90,000 Japanese language public domain volumes from Keio.

9:45 - Progress on Constitutional Convention Initiatives and Program Steering Committee Report 

Robert  Wolven, Columbia University, Chair, Program Steering Committee [PDF]

Wolven covered two topics in his presentation, providing an update on the activities of the Program Steering Committee and on the ballot initiatives that came out of the Constitutional Convention.

Program Steering Committee

The Program Steering Committee was established as part of the new governance structure and was given the charge of reviewing the development agenda and shaping initiatives and strategies. Eleven members were appointed by the Board to serve for 2-year terms. Activities in 2013-2014 have included focus on Constitutional Convention ballot initiatives, re-establishment of the Collections Committee, creation of the Rights & Access Working Group, and the charging of an advisory group for Zephir, the bibliographic data management system for HathiTrust. Charges for these groups are available at http://www.hathitrust.org/working_groups.

Constitutional Convention Ballot Initiatives

The Constitutional Convention (October 2011) resulted in 5 ballot proposals, charging HathiTrust to establish a stable and effective governance structure, to expand and enhance access to U.S. federal publications, to establish a distributed print monograph archiving program, to formalize a transparent process for development initiatives, and to develop and vet a fee-for-service model.

  • Governance structure: A Board of Governors and a 5-member Executive Committee were established by the ballot proposal, which also charged the Board of Governors with establishing bylaws. The Board of Governors also has the power to develop committees and working groups to carry out the work of the HathiTrust.

  • Government Documents Initiative Planning & Advisory Group: This group is composed of HathiTrust members and non-members. Mark Sandler discussed the initiative in further detail (see below).

  • Print Monographs Archive Task Force: This group was formed to address the ballot initiative. Tom Teper discussed the initiative in further detail (see below).

  • Development initiatives: Due to focus on other initiatives and groups, little work has been done thus far to implement this ballot initiative, but the PSC will be working on this shortly. Little feedback has been provided from stakeholders to provide direction for development.

  • Fee-for-service model: When this ballot proposal was passed, HathiTrust had a different financial model at the time, but the model has now changed and it is uncertain how a fee-for-service model will fit within the existing fee structure. In addition, many libraries are now members, and the demand has changed. Wolven suggests passing this to the Collections Committee for resolution.

Looking Ahead: PSC in 2014-2015

Wolven outlined several ways that the PSC can implement plans and recommendations, such as responding to reports and proposals, defining resource needs (i.e., determine if new staff need to be hired), and oversee and monitor programmatic activities. There are four possible areas of strategic focus:

  • Non-text formats

  • Quality assurance and validation

  • Services for print disabilities

  • Metadata strategies and policies

Briefs on each of these issues are available.

Questions & Comments:

Sarah Pritchard (Northwestern University) asked about the process for making decisions.

Wolven responded that the answer will be different in different areas. The PSC can help shape the issues for the Board, and the Board can consider what kind of engagement is necessary with the membership to move in certain directions.

Laine Farley (California Digital Library) asked how the various parts of the governance structure are working together.

Wolven described how the PSC has a biweekly conference call and uses a Google Site for drafts and notes. There is a lot of activity going on. The big question is how to move from discussion to action, which is what the working groups are doing. The PSC is learning how to work together, how to keep the group focused, how to get work done remotely. A big strategic question for HathiTrust is how much to rely on distributed effort and how much should be done by centralized staff. The Copyright Review Management System is a good example of a distributed effort that has succeeded.

Current HathiTrust Initiatives

10:30 - Developing a Distributed Print Monographs Archive 

Tom Teper, University of Illinois at Urbana-Champaign, Chair, HathiTrust Print Monographs Archive Planning Task Force [PDF]

Teper discussed the background and purposes of a print monographs archive initiative by HathiTrust. He described how academic institutions hold substantially duplicated print collections, and how the notion of a “collective collection” of materials held not by one institution, but collectively by many institutions is causing libraries to think about how to manage their collections in a collaborative way. The Print Monographs Archive Planning Task Force grew out of a ballot initiative passed at the Constitutional Convention, and began work in June 2014. Ten members, including individuals from non-HathiTrust partners, currently serve on the task force, chaired by Teper. Primary issues the Task Force is considering  include the qualifications participating institutions should meet, the kinds of analysis that are needed to support the print archive, how appropriate volumes will be identified for inclusion, the retention commitments that will be needed, models for discovery, access and service, models for business and financial management, and the roles and relationships of HathiTrust and other organizations and libraries.

In order to address these issues, the Task Force has created an aggressive timeline:

  • June–September 2014:  Planning and analysis.

  • October–November 2014: Initial drafting.

  • December–January 2014: Substantial completion, drafts ready for broader feedback.

The Task Force is operating on the assumptions that the network of print monograph archives will:

  • mirror the monographic holdings in HathiTrust;

  • represent a “new paradigm” for print collection management;

  • be built from the collections of HathiTrust members and serve the members

  • be distributed for security purposes

  • be persistent and preserve the print record, be governed and managed by the HathiTrust (not by a subset of its members).

Members of the Task Force envision that the archive will be a “loose-tight organization” with an underlying audit process to ensure commitments are being honored. Features that will distinguish the initiative from other print archiving projects include: its focus on monographs, its centering around a digital corpus, the commitments institutions make to print retention, and the provision of a suite of tools to support local decision-making.

The Task Force is seeking input on several issues, including:

  • What is the value of the archive to your institution?

  • Is it a true assumption that membership in the archive is not divisible from membership in HathiTrust?

  • What kind of costs will this project incur and how will remuneration work?

Questions & Comments:

Greg Raschke (North Carolina State University) asked how libraries might push themselves a little more on shared print collections.

Teper responded that one option to move toward more action is pre-identifying institutions that would become archives. Perhaps  6-7 libraries on board early and build collections with them first.

Leslie O’Brien (Virginia Tech) asked whether the Task Force has had specific discussions with regards to costs.

Teper responded that there have been conversations about the costs to institutions to maintain items and whether or not payments should be made. If payments were to be made, would they be structured as one-time payments or payments for work continuing done.  In such a scenario, would there be a subsidy for scarcity. The archive is meant to be a light archive to allow circulation of items, which raises questions about how to address the issue of rare books.

Barbara Dewey (Penn State) asked if the Task Force is mapping large print repositories that are already underway (e.g., CIC, ReCAP); is this project duplication or is it some kind of virtual use of the existing shared print initiatives.

Teper responded that there is not much overlap as other larger print initiatives are focused on serials, and this project is a monographic project. The Task Forces is looking at the OCLC report on regions, and a member of the Task Force is the executive director of ReCAP, so those perspectives are being included.

10:45 - Expanding Coverage and Enhancing Access to US Government Documents

Mark Sandler, Center for Library Initiatives, Committee on Institutional Cooperation, Chair, Government Documents Initiative Planning and Advisory Group [PDF]

Sandler discussed the charge of the HathiTrust Government Documents Initiative Planning and Advisory Working Group, which has its origins in the 2011 Constitutional Convention, and reviewed both the overall status of US federal government documents in HathiTrust and the progress of the working group. As of September 2014, HathiTrust holds 568,218 US federal government documents. HathiTrust is currently working to build a comprehensive registry of US federal government documents, and is partnering with Google to analyze bibliographic records for government documents from approximately 40 US institutions to determine which, of documents held by US libraries, have been digitized to date.

In October, the working group submitted a 20-page report describing details and complexities of the initiative to the Program Steering Committee (PSC). The report separates possible future activities of the government documents initiative into near-term, intermediate-term, and long-term activities.

Near-term activities:

  • Continue to build the Registry

  • Provide regular updates on the existing corpus

  • Member libraries (esp. CIC and UC) to continue to supply digital content

  • Enlist the support of docs librarians to analyze, organize and promote the corpus

  • Pursue partnerships with government agencies for records and content

  • Identify existing complementary projects and pursue partnerships

Intermediate-term activities:

  • Hire a project manager

  • Address uncataloged content

  • Seek out already-digitized content

  • Carry out quality assurance and deduplication

  • Enhance search and discovery options

Long-term activities:

  • Include content beyond that distributed by GPO

  • Enhance HathiTrust functionality to include rich-format content

  • Revisit access policies for non-member downloading

  • Develop a communication plan to promote the corpus

While the working group awaits comments from the PSC, group members plan to conduct an environmental scan of current work to catalog and digitized federal government documents in the US, and may form subgroups to flesh out specific recommendations in the report.

Questions & Comments:

Wendy Lougee (University of Minnesota) said that the GPO has expressed concerns about validating the authenticity of government documents. Has anything been done to address this concern?

Sandler responded that this issue has not yet come up in discussion.

11:00 - The HathiTrust Research Center

Stephen Downie, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, Co-Director, HathiTrust Research Center [PDF]

Stephen Downie co-directs the HathiTrust Research Center (HTRC) with Beth Plale, Director of the Data to Insight Center and professor in the School of Informatics and Computing at Indiana University. The Research Center began in 2011 so that HathiTrust could enable researchers to accomplish tera-scale text data-mining and analysis. In order to do this HTRC has focused on developing cyberinfrastructure for high performance computing access to the HathiTrust Digital Library, developing software tools for processing and analyzing text, and developing translational tools that can enable simplified access to these services for users. The HTRC has received grants from the Alfred P. Sloan Foundation, the Andrew W. Mellon Foundation, and the National Endowment for the Humanities to help create this infrastructure and to fund original research.  A new operations plan for 2014-2018 will also include funding from HathiTrust as well as Indiana University and the University of Illinois, where the center is co-located. Downie described four areas of work: 1) Core Development, which operates the infrastructure, 2) Advanced Collaborative Support, which provides users with access to expert staff help them develop and refine their research projects, 3) Advanced Research, which would be grant funded and support work that would not immediate translate to services, and 4) Scholarly Commons Users Support Service, which will create training and outreach programs to support librarians and digital humanities staff so that they can help their users take advantage of the HTRC. Downie noted that the 3rd HathiTrust Research Center UnCamp will be held in Ann Arbor, MI on March 30-31, 2015, and that a request for proposals to the Advanced Collaborative Support group would be soon forthcoming.

Questions & Comments:

Eliz Kirk (Dartmouth University) asked if the HTRC is looking at tools that allow people to look at just parts of the corpus (e.g., government documents).

Downie responded that yes, the HTRC is coming up with tools to allow people to select a subset of the collection. An example is music retrieval. This has been identified by users as a need.

Betsy Wilson (University of Washington) asked if Downie could characterize what the RFP for Advanced Collaborative Support is calling for from institutions or faculty.

Downie responded that submissions should indicate that they are interested in a project with HathiTrust. HTRC will fund development time to get the project off the ground. The RFP process is based on the XSEDE model to allow development time to move code or algorithms to the XSEDE network of High Performance Computing instruments.

11:15 - HathiTrust Budget Report

Richard Clement, University of New Mexico, Chair-elect/Treasurer, HathiTrust Board of Governors [PDF]

Clement reviewed HathiTrust’s budgeting process and calendar, the categories used in the budget, planned and actual spending for the 2014 fiscal year dar year), and future planning. HathiTrust uses the calendar year as its fiscal year and sends invoices to members in January of each year. Revenue for HathiTrust comes primarily from partner fees with a very small amount coming from investment by the University of Michigan, where the budget is held. Expenses are broken into categories for operations (personnel, infrastructure, travel and hosting, contracted services, Zephir) and programmatic activities (government documents initiative, the print monographs archive, copyright review, HTRC, other future projects). Clement shared the budget report as of August 31, 2014, showing the planned costs for 2014 as well as the year-to-date costs, and the final projected costs by end of 2014, which are anticipated to be on track with original projections.

The major factors impacting the 2015 budget are the need for additional staff to support outreach and programs, the initial stages of the government documents and print monographs archive programs, as well as other projects that will be initiated. The Board has also asked the Executive Director to begin developing multi-year projections, to include activities related to the government documents and print monographs archive initiatives, participation in the Digital Preservation Network (DPN), copyright review work, new collection areas, growth in membership, and analysis of the financial model.

Questions & Comments:

Carol Mandel (New York University) commented on “HathiTrust costs” versus “institutional costs,” picking up on comments made during the discussion on establishing a Shared Print Archive.  She cautioned HathiTrust working groups to focus on the core issues of their charge and avoid getting too caught up in analyzing and projecting fees and costs for activities.  Those are issues that can be examined during implementation and will be the province of the Board of Governors.

Brian Schottlaender (University of California, San Diego) stated that we should start considering for budget and planning purposes what types of other metadata HathiTrust will need, and other types of supporting registries.

12:15 - “Mammoth Tending: Notes from the Underfoot”

Jack Bernard, Associate General Counsel, University of Michigan

Jack Bernard gave a keynote speech covering the history of the Google Books project, the nature and purpose of copyright law, and the lawsuit against HathiTrust 

1:15 - Introduction and discussion:  Changes to the HathiTrust Bylaws 

Brian Schottlaender, University of California, San Diego, Past-chair, HathiTrust Board of Governors [PDF]

Schottlaender introduced proposed revisions to the HathiTrust bylaws. Prior to opening the floor for discussion, John Wilkin, speaking on behalf of the Committee on Institutional Cooperation (CIC), presented comments on the proposed revisions that were submitted by the CIC prior to the meeting.

Wilkin commented:

  • The expansion of the mission to materials beyond those digitized from print should not come at a loss of representation of the print.

  • The language specifying the co-ownership of the partner institutions should be retained.

  • The bylaws should use the positive statement relating to free-riders that is found in the current bylaws, rather than reverting to the more negative language found in the initial goals.

Questions & Comments:

Elisabeth Long (Chicago) asked about what we can take advantage of, with regards to “co-owned”? What are the advantages or implications to joint ownership?

Bernard responded that HathiTrust currently allows member libraries to make lawful uses of materials in HathiTrust that they own presently or previously owned. HathiTrust has been declared to be an “authorized entity” under Section 121 of the US Copyright Act, which can enable distribution of materials specifically to users who are blind or other who have disabilities. We have this right under law but do not gain rights from digitization alone and should be careful that this is not implied in the uses we make.

Winston Tabb (Johns Hopkins University) commented that the revised language in the bylaws should not be taken by the Board as a mandate.

1:45 - Structured Discussion: Future directions for HathiTrust

Mike Furlough, Executive Director

Furlough led discussion for the remainder of the afternoon.  The following questions were offered to give attendees the opportunity to identify opportunities and issues of concern for Furlough and the Board to address.

Question: Do you have any questions or observations at this point in the day?

Greg Raschke (North Carolina State University) asked what types of grants can be funded to do research and development projects around HathiTrust. What can we do to seed innovation grants? Can we have an R&D grant fund or work with the Research Center?

Mary Case (University of Illinois at Chicago) asked what can we do to communicate better around HathiTrust issues. Can we increase awareness of the HathiTrust name and recognition? She specifically asked Jack Bernard if there would be any value in communicating with reporters in major news outlets.

Lorelei Tanji (University of California, Irvine) stated on behalf of the University of California that they support the suggested amendments to the bylaws to include non-textual materials. However, they believe there is still a lot of work to be done surrounding HathiTrust’s work with textual materials. Some of the areas include the creation of an approval process for development initiatives; a process for prioritizing projects and enhancements; further work towards quality control process; the development of strategies and policies for collection and use of metadata; and access for users who have print disabilities.

Pat Steele (University of Maryland) commented also, with a view toward what is needed in a broader preservation ecosystem, that HathiTrust should concentrate on textual content first and see what else develops.

Question: How important are non-text collections? How important are new publications?

Brian Schottlaender (University of California, San Diego) stated that HathiTrust members have a great deal of expertise in many areas, and that HathiTrust untapped capacity. He asked how could we formalize an assessment of that capacity for doing more. We need to find out what people can and are willing to do.

Martha Hruska (University of California, San Diego) said that there is a lot of depth and wealth of materials in HathiTrust, but wondered how we partner with other institutions on similar issues (e.g., government documents and print archiving, etc.)? We need to connect the dots more carefully within HathiTrust.

Speaking of HathiTrust’s participation in DPN, Furlough commented that HathiTrust needed to work with DPN and other “nodes” to ensure that we meet our obligations and to make sure we’re not duplicating work in multiple repository programs. We still need to determine the nature of our partnership with DPN and what partnership means.

Stephen Downie (University of Illinois at Urbana-Champaign) noted that there are issues in historical musicology that scholars have discussed with him; interest in being able to mine the corpus for items other than text. Adding searching capabilities for these would expand the use and value of HathiTrust exponentially.

Judy Russell (University of Florida) stated that non-text collections are incredibly important. With regard to government documents, in order to have a comprehensive corpus, we need to include materials that are born-digital. The University of Florida has digital content that isn’t in HathiTrust, including a lot of non-text content. The University of Florida joined HathiTrust in order to contribute content, and would like to contribute it. She believed there were other HathiTrust partners in similar circumstances. With regards to new publications, the University of Floridais required to embargo their content for a time, but it is important to address this content.

Robert Wolven (Columbia University) stated that we need to continue to work on print, but we need to start looking at these other issues.

Carol Mandel (New York University) noted that we do not all have to be doing the same things. There is a lot of varied expertise across the members. Different members can focus on different things.

John Wilkin (University of Illinois at Urbana-Champaign) commented that there is a separation between publication and archiving, and tremendous costs in both activities. As we grapple with new types of publications, if we connected these two, we can we remove costs from the system.

Oya Rieger (Cornell University) commented on non-text collections, saying that we should call on visually-oriented staff to begin mapping the universe of other kinds of non-textual content resources (e.g., Shared Shelf, etc.), and to help us move from more abstract concepts to concrete architecture. We need to think about ingest paths and workflows for specialized collections. Cornell has DLXS collections which are text-based digital collections, and a pilot underway to use Hydra for metadata and to push content to HathiTrust. She would like to hear more about research data.

Furlough did not believe that people were assuming that HathiTrust would be a solution for storing research data and noted that there were other options out there for this.

Schottlaender stated that HathiTrust is best for masses of content ingest, as opposed to small batches of specialized content.

Keith Webster (Carnegie Mellon University) said that CMU has an ongoing project to manage data and software over time (Olive Executable Archive). They would like a collective community solution for obsolete software and executable content.  

Question: How can HathiTrust become better integrated into the work of your researchers, educators, and students?

Leslie O’Brien (Virginia Tech) stated that they would like HathiTrust content to be discoverable locally, and requested that work be done to increase the visibility of content in Summon and WorldCat.

Laine Farley (California Digital Library) mentioned how, for a sesquicentennial at the University of California, a group used the HathiTrust search widget, created a custom collection, and put the widget on a local webpage celebrating the event.

Abby Sheel (Florida State University) indicated that she is interested in advanced uses of HathiTrust and more training for educational purposes. She attended the HTRC unCamp as a librarian, but no researchers from FSU went. They need to be able to bring these skills back to campus. In addition, she would like to emphasize the importance of improving metadata and cleaning it up. For example, record duplication occurs because of incomplete records. When loaded into their local catalog, the records look different because of missing metadata fields, authority records that don't match, etc.

Stephen Downie (University of Illinois at Urbana-Champaign) suggested student opportunities to work at UM or CDL on systems. His students are interested. Perhaps there could be an apprenticeship program to train the next generation of practitioners.

Furlough suggested that remote work by students is a possibility, and could facilitate a program of this nature. More partnering could be done with Information schools as well.

Laine Farley (California Digital Library) talked about a historian that she spoke with who did not understand what is in HathiTrust. The statistics and visualizations HathiTrust provides on the website was valuable in explaining. Perhaps more things like this could be provided.

Laura Wood (Tufts University) said she was interested in hearing about proxied versus direct access to HathiTrust content.

Question: What is the most important thing HathiTrust can do for your library in the coming year?

Carol Mandel (New York University) was interested in increased engagement and communication. Last year was focused more on being a learning organization. She would like a lot more front-line reference people engaging with HathiTrust.

Furlough responded that he’s been spending the last several months visiting member libraries and working on outreach.

John Wilkin (University of Illinois at Urbana-Champaign) stated that we should move forward on a shared print repository. HathiTrust is good at leveraging existing infrastructure. As a first step, we should get something in place that is tied to the print holdings database, where we register who holds what. This could be revolutionary and utterly transformative. Then we can turn our attention to rights and quality.

Terese Heidenwolf (Lafayette College) commented that liberal arts colleges could offer a lot to the community. She has talked to students and faculty at other institutions, and there has been a lot of interest expressed in joining HathiTrust, but few have actually joined. We should focus on member growth.  

Question: What is the most important thing we can do together in the coming year?

Tom Teper (University of Illinois at Urbana-Champaign) commented that HathiTrust has received a lot of support from the National Federation of the Blind, but we have no members that are state institutions. There are a lot of opportunities for member growth there.

Furlough stated that there is a need for coordinated planning and long-range planning which is separate from specific initiatives. This should be done through a community process that is inclusive and responsive.

Jeremy York (HathiTrust) stated that there is a lot HathiTrust can do to leverage the contributions of member institutions. The User Support Working Group is a great example of success.

Closing Words

Sarah Michalak, University of North Carolina, Chair, HathiTrust Board of Governors

Michalak thanked attendees for their participation and expressed her hope that attendees would go home feeling more involved and engaged.