Reply to comment

Re: The price of Solid State Drives

As I understand your setup, your indexing process is batch-based and do not have near-realtime requirements. This makes it possible (and often desirable) to have hardware dedicated for indexing and hardware dedicated for searching. With such a setup, enterprise-level stability on the search-side is not needed as catastrophic hardware-crash does not mean loss of data. Your argument about not maximizing performance is technically valid: RAIDing or just connecting SSDs up to the TB level would probably saturate most standard controllers. However, not achieving maximum possible performance still leaves room for a huge performance boost over conventional hard drives. Your setup is very interesting as you need both fast random IO and high bulk transfer rate. Our setup is not heavy on the bulk side (we don't use phrase searches much) and on a 4-core machine with a single previous-generation SSD, 4 parallel searches performed at 308% of a single search, indicating that the CPU was the main bottlenect. Thus, I would not worry too much about the random access performance for a commodity RAID setup with Lucene. This still leaves bulk transfers, but here I guesstimate that hardware specs will be fairly accurate as it is simpler to design for and measure.

Reply

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.