{"id":5653,"date":"2018-05-06T13:02:05","date_gmt":"2018-05-06T13:02:05","guid":{"rendered":"http:\/\/www.garysieling.com\/blog\/?p=5653"},"modified":"2018-05-06T13:02:05","modified_gmt":"2018-05-06T13:02:05","slug":"parse-tokenize-string-lucene-7-3-0","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/parse-tokenize-string-lucene-7-3-0\/","title":{"rendered":"Parse\/tokenize a string with Lucene 7.3.0"},"content":{"rendered":"<p>To parse a string with lucene, like so:<\/p>\n<pre lang=\"java\">\nList<String> terms = parseText(\"This is a test\");\n<\/pre>\n<p>Add the following maven dependency:<\/p>\n<pre lang=\"xml\">\n    <dependency>\n      <groupId>org.apache.lucene<\/groupId>\n      <artifactId>lucene-core<\/artifactId>\n      <version>7.3.0<\/version>\n    <\/dependency>\n<\/pre>\n<p>And imports:<\/p>\n<pre lang=\"java\">\nimport java.io.IOException;\nimport java.io.StringReader;\nimport java.util.ArrayList;\nimport java.util.HashMap;\nimport java.util.List;\nimport java.util.Map;\n\nimport org.apache.lucene.analysis.Analyzer;\nimport org.apache.lucene.analysis.TokenStream;\nimport org.apache.lucene.analysis.tokenattributes.CharTermAttribute;\nimport org.apache.lucene.analysis.standard.StandardAnalyzer;\n<\/pre>\n<p>And the code:<\/p>\n<pre>\n public static List<String> parseText(String text) {\n    \tif (keywords == null) {\n    \t\treturn new ArrayList<String>();\n    \t}\n    \t\n        List<String> result = new ArrayList<String>();\n        TokenStream stream = analyzer.tokenStream(\"nofield\", new StringReader(text));\n\n        try {\n            stream.reset();\n\n            while(stream.incrementToken()) {\n                result.add(stream.getAttribute(CharTermAttribute.class).toString());\n            }\n        } catch(IOException e) {}\n\n        return result;\n    }  \n <\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Tokenizing text with Lucene<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[4],"tags":[300,348,517],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/5653"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=5653"}],"version-history":[{"count":0,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/5653\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=5653"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=5653"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=5653"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}