Extracting all the text strings from a Javascript file

To get all the strings from a Javascript file, you’ll need to parse the file to get a syntax tree (AST), then walk it.

You can do this trivially with the ESPrima library (for parsing) and Estraverse (for walking the AST).

To identify strings, we need to check for “Literal” type tokens, and check whether the value is actually a string or not (I’m doing this with lodash, out of convenience):

const fs = require("fs");
const esprima = require("esprima");
const estraverse = require("estraverse");
const _ = require("lodash");
const filename = "node_modules/react/dist/react.js";

const ast = esprima.parse(
  fs.readFileSync(filename)
);

estraverse.traverse(ast, {
  enter: (node, parent) => {
    if (node.type === "Literal") {
      if (_.isString(node.value)) {        
        console.log(node.value);
      }
    } 
  }
});

If you want the exact location in the file, or the line numbers, you can request these by passing an additional argument to the parse function:

{
 loc: true,
 range: true,
 tokens: true,
 comment: true
}

If you to filter the strings by their context in the file, you’ll need to keep track of this yourself.

The way to do this is to create a stack, and push / pop to it as the tree is walked:

let tree = [];

estraverse.traverse(ast, {
  enter: (node) => {
    tree.push(
      _.omit(
        node, 
        ["left", "right"])
    );
    ...
  },
  leave: (node) => {
    chain.pop();
  }
}