{"id":4269,"date":"2016-05-31T02:06:48","date_gmt":"2016-05-31T02:06:48","guid":{"rendered":"http:\/\/www.garysieling.com\/blog\/?p=4269"},"modified":"2016-05-31T02:06:48","modified_gmt":"2016-05-31T02:06:48","slug":"extracting-text-strings-javascript-file","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/extracting-text-strings-javascript-file\/","title":{"rendered":"Extracting all the text strings from a Javascript file"},"content":{"rendered":"<p>To get all the strings from a Javascript file, you&#8217;ll need to parse the file to get a syntax tree (AST), then walk it.<\/p>\n<p>You can do this trivially with the ESPrima library (for parsing) and Estraverse (for walking the AST).<\/p>\n<p>To identify strings, we need to check for &#8220;Literal&#8221; type tokens, and check whether the value is actually a string or not (I&#8217;m doing this with lodash, out of convenience):<\/p>\n<pre lang=\"javascript\">\nconst fs = require(\"fs\");\nconst esprima = require(\"esprima\");\nconst estraverse = require(\"estraverse\");\nconst _ = require(\"lodash\");\nconst filename = \"node_modules\/react\/dist\/react.js\";\n\nconst ast = esprima.parse(\n  fs.readFileSync(filename)\n);\n\nestraverse.traverse(ast, {\n  enter: (node, parent) => {\n    if (node.type === \"Literal\") {\n      if (_.isString(node.value)) {        \n        console.log(node.value);\n      }\n    } \n  }\n});\n<\/pre>\n<p>If you want the exact location in the file, or the line numbers, you can request these by passing an additional argument to the parse function:<\/p>\n<pre lang=\"javascript\">\n{\n loc: true,\n range: true,\n tokens: true,\n comment: true\n}\n<\/pre>\n<p>If you to filter the strings by their context in the file, you&#8217;ll need to keep track of this yourself.<\/p>\n<p>The way to do this is to create a stack, and push \/ pop to it as the tree is walked:<\/p>\n<pre lang=\"javascript\">\nlet tree = [];\n\nestraverse.traverse(ast, {\n  enter: (node) => {\n    tree.push(\n      _.omit(\n        node, \n        [\"left\", \"right\"])\n    );\n    ...\n  },\n  leave: (node) => {\n    chain.pop();\n  }\n}\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>How to parse Javascript files to extract strings<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[12],"tags":[302],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/4269"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=4269"}],"version-history":[{"count":0,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/4269\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=4269"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=4269"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=4269"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}