Parse/tokenize a string with Lucene 7.3.0

To parse a string with lucene, like so:

List terms = parseText("This is a test");

Add the following maven dependency:

    
      org.apache.lucene
      lucene-core
      7.3.0

And imports:

import java.io.IOException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.standard.StandardAnalyzer;

And the code:

 public static List parseText(String text) {
    	if (keywords == null) {
    		return new ArrayList();
    	}
    	
        List result = new ArrayList();
        TokenStream stream = analyzer.tokenStream("nofield", new StringReader(text));

        try {
            stream.reset();

            while(stream.incrementToken()) {
                result.add(stream.getAttribute(CharTermAttribute.class).toString());
            }
        } catch(IOException e) {}

        return result;
    }

Leave a Reply Cancel reply