Navigation

Large-scale Search

Slow Queries and Common Words (Part 2)

In part 1 we talked about why some queries are slow and the effect of these slow queries on overall performance. The slowest queries are phrase queries containing common words.  These queries are slow because the size of the positions index for common terms on disk is very large and disk seeks are slow.  These long positions index entries cause three problems relating to overall response time:

Current Hardware Used for Testing

This is a brief note on the  current hardware and software environment we are using for Solr testing.

Solr Servers

  • Two Dell PowerEdge 1950 blades
  • 2 x Dual Core Intel Xeon 3.0 GHz 5160 Processors
  • 8GB - 32GB RAM depending on the test configuration
  • Red Hat Enterprise Linux 5.3 (kernel: 2.6.18 PAE)
  • Java(TM) SE Runtime Environment (build: 1.6.0_11-b03)
  • Solr 1.3
  • Tomcat 5.5.26

Storage Server

Slow Queries and Common Words (Part 1)

All Queries are not created equal

Update on Testing (Memory and Load tests)

Since we finished the work described in the Large Scale Search Report we have made some changes to our test protocol and upgraded our Solr implementions to Solr 1.3. We  have completed some testing with increased memory and some preliminary load testing.

The new test protocol has these features

Large-scale Full-text Indexing with Solr

[Copied from the Blog for Library Technology]

A recent blog pointed out that search is hard when there are many indexes to search because results must be combined. Search is hard for us in DLPS for a different reason. Our problem is the size of the data.