Gary Sieling

Decision Tree Testing Lessons

I’m running some tests on sklearn decision trees1, and the lessons learned so far may be interesting.

I’ve put my measurement code at the end – I’m tracking % correct, number of tests that are positive, negative, and false positives and negatives.

testsRun = 0
testsPassed = 0
testsFalseNegative = 0
testsFalsePositive = 0
testsPositive = 0
testsNegative = 0
for t in test:
  prediction = clf.predict(t)[0]
  if prediction == 0:
    testsNegative = testsNegative + 1
  else:
    testsPositive = testsPositive + 1

  if prediction == test_v[testsRun]:
    testsPassed = testsPassed + 1
  else: 
    if prediction == 0:
      testsFalseNegative = testsFalseNegative + 1
    else:
      testsFalsePositive = testsFalsePositive + 1
 
  testsRun = testsRun + 1
 
print "Percent Pass: {0}".format(100 * testsPassed / testsRun)
print "Percent Positive: {0}".format(100 * testsPositive / testsRun)
print "Percent Negative: {0}".format(100 * testsNegative / testsRun)
print "Percent False positive: {0}".format(100 * testsFalseNegative / (testsFalsePositive + testsFalseNegative))
print "Percent False negative: {0}".format(100 * testsFalsePositive / (testsFalsePositive + testsFalseNegative))
  1. http://www.garysieling.com/blog/building-decision-tree-python-postgres-data []
  2. http://www.garysieling.com/blog/sklearn-gini-vs-entropy-criteria []
  3. http://www.garysieling.com/blog/convert-scikit-learn-decision-trees-json []
Exit mobile version