I’m working my way through Programming Computer Vision with Python, a compact introduction to Computer Vision. Computer Vision is a fascinating subset of computer science that has recently pushed aggressively forward through a combination of Dept of Defense research in self-driving cars, video game development, and rapid improvements in computer hardware.
I’m writing a series of examples using cell phone pictures of a Redbox, a movie rental kiosk which became prevalent after the advent of Netflix, helping to seal the demise of Blockbuster. The kiosks are a vending machine combined with a large movie billboard, showing what is currently playing:
This image is relatively straightforward to process visually, but presents an interesting series of challenges for a computer. While the red lines are important, there is a lot of contrast in the lines between the bricks – maybe those are important too. And, the machine has to ignore the elbow in the picture. The image is also a little crooked, and is washed out at the top and bottom, presumably due to the angle of a plastic covering. The red lines are very bold, but most examples in the book start by converting all images to black and white, so this requires some color correction ahead of time.
As a first step towards color correcting the image, Computer Vision with Python defines a useful function for making the color histogram more even, which we’ll need here. The intended effect is to increase contrast, so that if we filter out part of the image, we can easily filter out even amounts.
def histeq(im,nbr_bins=256): data = im.flatten() imhist,bins = histogram(data, nbr_bins, normed=True) cdf = imhist.cumsum() cdf = 255 * cdf im2 = interp(im.flatten(), bins[:-1], cdf) return im2.reshape(im.shape)
Next, we open the image and apply filters. This function does a few things- it first flips the image, as I found that some of the math was doing that for me, so the effect can be reversed. It converts it into a color system that ignores all colors but red, which improves the output a little bit, then converts that to black and white.
Finally, it filters the black and white values – this is a fairly crude technique, presented in the book as a way to help later algorithms focus on what to look at. I extracted the range to filter on as a parameter, which helps in tuning the result later.
from PIL import Image def fn(filt): img1 = Image.open(dir + redbox1).transpose(Image.FLIP_TOP_BOTTOM) rgb2xyz = ( 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0 ) img1 = img1.convert("RGB", rgb2xyz) img1 = array(img1.convert('L')) img1 = histeq(img1) img1_f = 1 * (img1 > filt) return img1_f
One of the challenges of reading a computer vision book is that there is a lot of math to take in; matrix algebra in particular. Thankfully there are handy python libraries that wrap a lot of this for us. One such library can count the number of items in an image, by looking at contiguous blocks of color:
from numpy import * from pylab import * from scipy.ndimage import measurements,morphology def count_items(img1_f, it): im_open = morphology.binary_opening( \ img1_f,ones((9,5)), iterations=it) labels_open, nbr_objects_open = measurements.label(im_open) return labels_open
This demonstrates a neat trick – items in an image which are close sometimes have small rendering artifacts between them that make them look like a single unit. The “iterations” parameter tells the library to make several passes at chipping away these artifacts.
We can then render the image on a graph using matplotlib, drawing the “labels” over top:
def fig(filt, it): clf() imshow(Image.open(dir + redbox1)) gray() contour(count_items(fn(filt), it), origin='image') show() fig(90, 3) 47
At this point, you can see we’re getting 47 items back in the image, which is a little high, but not far off. In fact, to get the right tuning parameters for the filter and iterations parameters, I looked at many combinations until I found the best fit.
Here is just the edges on their own – it does a pretty good job of outlining each movie:
And here, by comparison is one of many bad images:
On it’s own this isn’t that useful – ideally we want to extract what movies are there. What this does provide is a testing framework for future work, and we can see some of the challenges tuning algorithms and see that they do have the capacity to find items on their own.