Reply to comment

Re: Posting/position list size computation

Hi Ahsan,

The stats on the size of the postings are for only one field. We have one field that contains all the OCR for a document (about 800K) and a number of very small metadata fields based on the MARC bibliographic metadata. Those other fields average 1KB or less. We are presently only searching against the OCR field, so all the calculations are for that one field. The tool we use to get the total number of occurrences of a term is now part of lucene/contrib/ if you use it with the -t flag it will count the total number of occurrences for the top N most frequent terms, which is a proxy for the size of the positions index. You can also specify the field as well as N.

java org.apache.lucene.misc.HighFreqTerms [-t][number_terms] [field]

-t: include totalTermFreq

See:

http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contri...

Tom

Reply

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.