Parse/tokenize a string with Lucene 7.3.0

To parse a string with lucene, like so:

List<String> terms = parseText("This is a test");

Add the following maven dependency:


And imports:

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.standard.StandardAnalyzer;

And the code:

 public static List parseText(String text) {
    	if (keywords == null) {
    		return new ArrayList();
        List result = new ArrayList();
        TokenStream stream = analyzer.tokenStream("nofield", new StringReader(text));

        try {

            while(stream.incrementToken()) {
        } catch(IOException e) {}

        return result;

Need help with Solr or Elastic Search? Contact me for Solr consulting.

Leave a Reply

Your email address will not be published. Required fields are marked *