Reply to comment

commongrams versus stopwords

If you use this technique, I think you should try setting DefaultSimilarity.setDiscountOverlaps(true). I did some tests which showed that if you use commongrams, it will punish relevance somewhat, because these injected tokens adversely influence lengthNorm. if you discount these tokens with positionIncrement=0 by setting that parameter, then this problem goes away.

Reply

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.