{"id":2107,"date":"2014-03-03T13:49:07","date_gmt":"2014-03-03T13:49:07","guid":{"rendered":"http:\/\/www.garysieling.com\/blog\/?p=2107"},"modified":"2020-03-31T00:46:31","modified_gmt":"2020-03-31T00:46:31","slug":"convert-scikit-learn-decision-trees-json","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/convert-scikit-learn-decision-trees-json\/","title":{"rendered":"Convert scikit-learn decision trees to JSON"},"content":{"rendered":"<p>SKLearn has a function to convert decision trees to &#8220;graphviz&#8221; (for rendering) but I find JSON more helpful, as you can read it more easily, as well as use it in web apps. The function below will give you JSON.<\/p>\n<p>The reason this is necessary (vs the JSON.dumps) library is that the Decision Tree interfaces don&#8217;t support the interfaces the JSON library needs to run. Additionally, even if it did, the JSON library in python dies on very small floating point numbers, which is why it&#8217;s not used at all in my version.<\/p>\n<pre lang=\"python\">\n\ndef treeToJson(decision_tree, feature_names=None):\n  from warnings import warn\n\n  js = \"\"\n\n  def node_to_str(tree, node_id, criterion):\n    if not isinstance(criterion, sklearn.tree.tree.six.string_types):\n      criterion = \"impurity\"\n\n    value = tree.value[node_id]\n    if tree.n_outputs == 1:\n      value = value[0, :]\n\n    jsonValue = ', '.join([str(x) for x in value])\n\n    if tree.children_left[node_id] == sklearn.tree._tree.TREE_LEAF:\n      return '\"id\": \"%s\", \"criterion\": \"%s\", \"impurity\": \"%s\", \"samples\": \"%s\", \"value\": [%s]' \\\n             % (node_id, \n                criterion,\n                tree.impurity[node_id],\n                tree.n_node_samples[node_id],\n                jsonValue)\n    else:\n      if feature_names is not None:\n        feature = feature_names[tree.feature[node_id]]\n      else:\n        feature = tree.feature[node_id]\n\n      if \"=\" in feature:\n        ruleType = \"=\"\n        ruleValue = \"false\"\n      else:\n        ruleType = \"<=\"\n        ruleValue = \"%.4f\" % tree.threshold[node_id]\n\n      return '\"id\": \"%s\", \"rule\": \"%s %s %s\", \"%s\": \"%s\", \"samples\": \"%s\"' \\\n             % (node_id, \n                feature,\n                ruleType,\n                ruleValue,\n                criterion,\n                tree.impurity[node_id],\n                tree.n_node_samples[node_id])\n\n  def recurse(tree, node_id, criterion, parent=None, depth=0):\n    tabs = \"  \" * depth\n    js = \"\"\n\n    left_child = tree.children_left[node_id]\n    right_child = tree.children_right[node_id]\n\n    js = js + \"\\n\" + \\\n         tabs + \"{\\n\" + \\\n         tabs + \"  \" + node_to_str(tree, node_id, criterion)\n\n    if left_child != sklearn.tree._tree.TREE_LEAF:\n      js = js + \",\\n\" + \\\n           tabs + '  \"left\": ' + \\\n           recurse(tree, \\\n                   left_child, \\\n                   criterion=criterion, \\\n                   parent=node_id, \\\n                   depth=depth + 1) + \",\\n\" + \\\n           tabs + '  \"right\": ' + \\\n           recurse(tree, \\\n                   right_child, \\\n                   criterion=criterion, \\\n                   parent=node_id,\n                   depth=depth + 1)\n\n    js = js + tabs + \"\\n\" + \\\n         tabs + \"}\"\n\n    return js\n\n  if isinstance(decision_tree, sklearn.tree.tree.Tree):\n    js = js + recurse(decision_tree, 0, criterion=\"impurity\")\n  else:\n    js = js + recurse(decision_tree.tree_, 0, criterion=decision_tree.criterion)\n\n  return js\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>SKLearn has a function to convert decision trees to &#8220;graphviz&#8221; (for rendering) but I find JSON more helpful, as you can read it more easily, as well as use it in web apps. The function below will give you JSON. The reason this is necessary (vs the JSON.dumps) library is that the Decision Tree interfaces &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.garysieling.com\/blog\/convert-scikit-learn-decision-trees-json\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Convert scikit-learn decision trees to JSON&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[4,5,6],"tags":[164,302,322,447,508],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/2107"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=2107"}],"version-history":[{"count":1,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/2107\/revisions"}],"predecessor-version":[{"id":6498,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/2107\/revisions\/6498"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=2107"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=2107"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=2107"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}