New Hardware for searching 5 million+ volumes of full-text
On November 19, 2009, we put new hardware into production to provide full-text searching against about 4.6 million volumes. Currently we have about 5.3 million volumes indexed. Below is a brief description of our current production hardware. Future posts will give details about performance and background on our experiments with different system architectures and configurations.
Hardware details
Solr Server configuration
- Dell PowerEdge R710
- 2 x Quad Core Intel Xeon E5540 2.53GHz processors (Nehalem)
- 72 GB RAM
- Red Hat Enterprise Linux 5.4 (kernel: 2.6.18 X86_64)
- Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
- Solr 1.3.0.2009.09.03.11.14.39 (1.4-dev 793569)
- Tomcat 5.5.27
Storage
- Isilon IQ NAS cluster (20 I/X-series nodes, 4 GB RAM per node)
- 480 750GB or 1TB SATA drives providing 420 TB raw storage
- 4GB RAM per node giving 80 GB of coherent cache in aggregate
Network
- NFS uses a dedicated/private 9K MTU GbE network on Dell PowerConnect 5448 switch
- NFS clients single-homed and mounts automatically distributed across all cluster nodes
Current Solr Architecture and Configuration
Search Servers
- 4 Servers with one Tomcat and 3 shards per server; 10 of 12 shards currently in use
- 16 GB allocated to the JVM
Indexing Server
- 1 Server with 12 Tomcats and 12 shards; 10 of 12 tomcats/shards currently in use
- 6 GB allocated to each of 10 JVMs

Comments
What kind of startup
storage media, longevity, reliability
Thank you for your questions.
Thank you for your questions. We've added some information at http://www.hathitrust.org/technology that should help to answer them.
Post new comment