HathiTrust Research Center
The HathiTrust Research Center focuses on the combined strengths of the collection to spark curiosity: what can we learn from so many books? HTRC can help you and your users find the right questions and help answer them using text and data mining. Find easy entry points to digital humanities tools, as the entire collection is open for text and data mining explorations. Members may also access copyrighted titles for research in more advanced computing environments. Whether for classroom use, individual research projects, or ongoing library instruction, HTRC tools illuminate the landscape of human knowledge.
Benefits of the HathiTrust Research Center
Support Digital Humanities at All Levels
- Novice digital humanities researchers can dig into their questions using simple tools without any programming experience.
- Advanced researchers can access robust datasets and apply for powerful computing environments to support demanding workloads. Advanced researchers can access robust datasets and apply for access to secure computing environments to support advanced text analysis.
- Help your library build functional knowledge in digital humanities methods and tools with hands-on workshops and training.
- Research support is provided to HathiTrust members through a periodic competitive awards program called Advanced Collaborative Support.
Explore the Collection at Scale
- Investigate 18+ million volumes and trillions of words with powerful, behind-the-scenes computation.
- Subjects range from Archeology to Zoology with titles in more than 400 languages.
- Discover new lines of research or new findings to answer long-standing questions.
Text and data mining tools
HTRC has developed a suite of tools and services for text data mining including web-based algorithms, freely-accessible datasets, and secure computing capsules. Access the tools below, as well as tutorials and other documentation on HTRC Analytics.
Text and Data Mining Tools
HathiTrust + Bookworm
Visualizes word trends in millions of volumes held by HathiTrust. It enables scholars to discover new textual use patterns across the entire corpus, including in-copyright and public domain volumes.
HTRC Algorithms are click-to-run tools for text analysis. They require no programming, and researchers can set the parameters for their analysis. Use them to explore HathiTrust worksets, which are groups of titles from the collection.
Datasets and Data APIs
A dataset compiles specific features extracted from the full text of the HathiTrust corpus. Certain datasets are freely available and pre-created. HathiTrust also provides access to several data APIs. Other data and datasets can be accessed upon request.
An advanced computing environment available only to HathiTrust members, data capsules provide high-capacity computing for advanced text an. Access to in-copyright material is available to HathiTrust members.
Whether you’re new to text and data mining or going deeper into research, you can get started at HTRC’s main website.
HTRC’s mission serves a range of uses and users, with priority given for HathiTrust members. However, affiliation with a HathiTrust member institution is not required to use most tools and services and the core services are available at no cost. Some HTRC tools and services require an account on HTRC Analytics to access. Access to copyrighted material for some research methods is available only to member researchers.
Support and Training
The main website for HTRC is a good starting point to learn more about the center and its approach to text and data mining.
New to the HathiTrust Research Center? Find everything you need to get started including videos, guides, and more.
This policy defines non-consumptive research and non-consumptive exports as implemented for non-profit research and educational analytical use of the HathiTrust Digital Library.
Find step-by-step instructions and more training resources on the HTRC Wiki.
HTRC Advanced Collaborative Support offers specialized expertise, developer time, and compute resources to researchers who apply for and are awarded support.
This policy outlines the terms for researchers using HTRC's advanced research computing environment that may include copyrighted texts.
Contact our member-led user support team for help!