Reply to comment

re:PatternTokenizerFactory

Thanks for the feedback David,

Sorry I didn't explain the algorithm very well. I should have probably put a code snippet in my previous response. We aren't splitting on punctuation, we are constructing tokens where white space replaces punctuation.
Examples:
"l'art"=>"l art"
"can't"=>"can t".

Our problem was that the WDF was splitting on punctuation and therefore making "l'art" into two tokens which resulted in a phrase query for the token "l" followed by the token "art". Our filter would just make it a query (boolean clause) for the token "l art" instead of the token "l'art"

Tom

Reply

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.