{"id":5083,"date":"2016-09-22T01:44:52","date_gmt":"2016-09-22T01:44:52","guid":{"rendered":"http:\/\/www.garysieling.com\/blog\/?p=5083"},"modified":"2016-09-22T01:44:52","modified_gmt":"2016-09-22T01:44:52","slug":"change-tika-output-format-text","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/change-tika-output-format-text\/","title":{"rendered":"Change Tika output format to text"},"content":{"rendered":"<p>Tika supports multiple output formats &#8211; the default is HTML, which seems like an odd choice.<\/p>\n<p>You can change it to text like so:<\/p>\n<pre lang=\"bash\">\njava -jar tika.jar -t\n<\/pre>\n<p>There are different options for many formats &#8211; you&#8217;ll need to decide whether you want metadata from Office documents or not.<\/p>\n<pre lang=\"bash\">\nusage: java -jar tika-app.jar [option...] [file|port...]\n    -x  or --xml           Output XHTML content (default)\n    -h  or --html          Output HTML content\n    -t  or --text          Output plain text content\n    -T  or --text-main     Output plain text content (main content only)\n    -m  or --metadata      Output only metadata\n    -j  or --json          Output metadata in JSON\n    -y  or --xmp           Output metadata in XMP\n    -l  or --language      Output only language\n    -d  or --detect        Detect document type\n    -eX or --encoding=X    Use output encoding X\n    -pX or --password=X    Use document password X\n    -z  or --extract       Extract all attachements into current directory\n    --extract-dir=<dir>    Specify target directory for -z\n    -r  or --pretty-print  For XML and XHTML outputs, adds newlines and\n                           whitespace, for better readability\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Change the output format of Tika<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[3],"tags":[300,517,545],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/5083"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=5083"}],"version-history":[{"count":0,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/5083\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=5083"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=5083"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=5083"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}