knest was the result of a senior year capstone project proposal from a high-ranking faculty member from my university. We planned and researched the initial implementation for about three to four months; however, we realized that the initial planning was all wrong, and ended up planning, implementing, and documenting the entire project in about two months. The final application won an award at the annual state-wide competition for best in discipline for computer science, much to our surprise. Let’s talk about each of the stages of the project in a series of blog posts.
This post and the next few that follow will detail our efforts during the development of the project. My hope is that someone will find these explanations helpful to their own work, or at least interesting enough to read.
What is knest?
The application can be most simply explained with the following excerpt from the README:
Take a folder that contains photos of birds and other stuff, get a folder of just birds, no other stuff.
The project is comprised of five stages that filter a collection of photographs through a combination of the following factors: blur extent, object presence, and image similarity, and then cropped copies of the original photographs that pass these criteria are saved in a separate folder inside the originally selected folder. Essentially, knest is an application that leverages computer vision and deep learning techniques in order to non-destructively produce aesthetically-pleasing copies of bird photographs. Yes, birds.
Let’s focus on the first stage of the appplication.
Stage One: Blur Detection
Blur detection is the first stage of the knest application and was actually really important for filtering images that would have been a waste to pass to the object classification and localization stages. Since the program was designed to run on consumer hardware, it was imperative that the team optimize computing power wherever possible.
Every single image file that is processed by knest is guaranteed to encounter the blur detection stage. The presence of unintentional blur is a frequent occurence in any type of photography, and wildlife photography is no exception. If the image lacks unique or discerible detail, then it’s harder for the human viewer to determine the salient parts of the image and downright impossible for an application to do it with reliability.1
So what makes something blurry? It all has to do with edges.
Edges: A Primer by Example
Edges, in the computer or machine vision parlance, are organized sets of points (read: lines) at which the brightness of an image changes significantly. Detecting edges is foundational to much of computer vision as a whole, and feature detection and extraction depends quite heavily upon it. For example, take the Canny edge detection algorithm.
In the foreground of the above picture, we see an astronaut dressed in their flight suit as they hold their helmet in their lap. In the background, we can see a United States flag on the left and a small-scale model of a space shuttle, both against a formless background. There are a number of things that are immediately apparent in the resultant image generated by the Canny edge detection process. First, at virtually any place where there is a sharp difference in color (grayscale intensity, in this case), there is an edge that corresponds to the location of that change. A few clear examples are the astronaut’s head agsinst the backdrop, their neck and dark undershirt, the stars and stripes of the flag, and the bright rectangle on the astronaut’s helmet. The patches and pull cord, presumably made for high visibility, have a high level of detail in the edge image as well.
However, there are some edges that humans can readily recognize but pose a bit of difficulty for this algorithm. Focus on the space shuttle model in the background. The top of the space shuttle faces towards the left side of the image and is almost completely white. Due to its closeness in color to the backdrop (as well as some blurring), the algorithm has a bit of a hard time with determining the edge of the shuttle. The same situation is present with the left wing as well as the top ends of the two boosters. At the other extreme, the dark parts under the fuselage that extend towards the wings are also close to the color to the main rocket as well as lacking in sharpness, so the algorithm has trouble there too.
The Canny algorithm is one of the oldest edge detection techniques; however, it still finds a lot of use for its simplicity and accessibility. I won’t be detailing the process in this post as it’s not the focus, but if you want to learn more about it, read this guide. It really comes down to taking the derivatives of a smoothed (de-noised) image matrix and then doing some threshholding in order to make sure that only strong edges are included in the resultant matrix. It really is an interesting algorithm, so maybe you should read it and come back for the rest of the post.
So that’s the primer on edges…but what does that have to do with blur detection? Well, here’s a thought: if the first derivatives of an image matrix give us the locations where there are significant changes in intensity, then perhaps the second derivatives might give us some information about those intensity changes, e.g. how “fast” does it change? Let’s think of speed in this case as the area throughout which the change in intensity is distributed. If the significant change occurs in a small area, then one could think of that change as having a sharp dropoff. Conversely, if that change occurs in a larger, more spread out area, then one could think of the change of having a soft dropoff.
Furthermore, if we substitute “edge” for the concept of “significant change in intensity”, we’ve practically answered the answer of this paragraph. A blurry edge is an edge in which the change in intensity is spread over a large area. And a blurry image is an image in which most of the edges are blurry.
Now that we’ve answered the question of what blur is in a (basic) computer vision sense, I’ll share the two attempts that our team tried.
First Attempt: Laplacian and Co.
Disclaimer: My expertise is not in mathematics. You’ll see.
Armed with this assumption that we could determine if images were blurry through measuring “change of change” of intensity, we set off to find algoritms that would allow for such a thing. We found a number of things that would compute the second derivative of the image matrix through convolution with a small filter. And we felt that we had found an okay way of using those measures to relate the information from those derivatives to the presence of blur in the image: variance. The thought was that images with a lot of sharp edges would have a higher magnitude of variance as “change in change” would be higher than in images where there was a significant lack of edges. So I crafted up a small set of functions to run some variance calculations with all of the heavy convolution stuff being handled by OpenCV.
def variance(image): """ Calculate the variance of Laplacian distribution, which is a measure of the sharpness of an image. Blurry pictures have less regions containing rapid intensity changes, meaning that there will be less of a spread of responses (low variance). Sharper images will have a higher spread of responses (high variance). image: (String) path to the image being tested """ image = cv2.imread(image) return cv2.Laplacian(image, cv2.CV_64F).var() def teng(image): """ Calculate the Tenegrad variance. image: (Array) Array (color/greyscale) representation of image """ gauss_x = cv2.Sobel(image, cv2.CV_64F, 1, 0) gauss_y = cv2.Sobel(image, cv2.CV_64F, 1, 0) return np.mean((gauss_x * gauss_x) + (gauss_y * gauss_y)) def lapm(image): """ Calculate the modified Laplacian variance. image: (Array) Array (color/greyscale) representation of image """ kernel = np.array([-1, 2, 1]) lap_x = np.abs(cv2.filter2D(image, -1, kernel)) lap_y = np.abs(cv2.filter2D(image, -1, kernel.T)) return np.mean(lap_x + lap_y) def check_sharpness(image_path, threshold=LAP_THRESHOLD): """ Determine whether the sharpness of an image exceeds the threshold for variance. Those surpassing the threshold will return True. image: (String) path to the image being tested threshhold (Float) minimum variance for acceptance """ sharpness = variance(image_path) return sharpness, sharpness > threshold
We also tried some other things in our first attempt, i.e. Fourier transforms, discrete wavelet transforms, etc. If you want to see the code at that point in time, feel free to explore the commit for that point in time.
There were probably many things wrong with our mathematical assertions, but there was one glaring oversight and you may have already thought of it. The algoritem works all find and dandy in images with a lot of different things present in the viewpoint as there are so many edges that can be used to calculate derivatives and all that, like in this image:
But what happens in clear (non-blurry) images like the one below where there aren’t many edges upon which to calculate rates of change?
It fails, and quite badly, so using that was out of the question. We also tried using every combination of the other measures that we had found along with different thresholds for each. Disappointed, my team members and I shelved blur detection as a filter stage as we worked on other things.
Final Attempt: Haar Wavelets
Through some quite extensive research, we found our answer: wavelet transforms. According to Tong et al. 2, different edges are classified into four types: Dirac-Structure, Roof-Structure, Astep-Structure, and Gstep-Structure.
The parameter α for Roof-Structure and Gstep-Structure indicates the magnitude of sharpness for that edge.
The general idea of their research has two main parts. Dirac-Structure and Astep-Structure edges will typically disappear in a blurry image, no matter the source of the blur itself, whether it is lack of focus or a subject in motion. Roof-Structure and Gstep-Structure edges will remain in a blurry image but with a loss in sharpness. By combining these findings with something that can find irregular structures at any different resolutions (wavelet transforms), the researchers created a scheme in which one can tell whether an image is blurry or not, and if so, how extent or severity of that blur. I will refer you to the paper for the mathematics of the theory; it’s only four pages including references, so it’s a fairly quick read.
Thus, we rewrote the blur detection stage to use this new algorithm. We implemented it straight from the paper itself, and by using a couple of modules for image reading and Haar wavelet transforms, we came out with a rather sleek and efficient way to programmatically exclude blurry photos from being sent to more computationally expensive stages. The main function for the blur detection stage is included below.
def detect_blur(img_path): """ Method where final image and blur classification is returned img_path: (String) absolute path to image file """ # exception handling for OpenCV image conversion try: # convert image to numpy array image = cv2.imread(img_path) # convert to RGB img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) pass except Exception: return img_path, False # convert image to numpy array image = cv2.imread(img_path) # convert to RGB img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # determine dimensions for resize size = determine_resize(img) img = cv2.resize(img, (size, size)) # calculate local maxima of each edge map (3 in total) emax1, emax2, emax3 = calc_intensities(img) # calculate Nedge, Nda, Nrg and Nbrg Nedge, Nda, Nrg, Nbrg = calc_values(emax1, emax2, emax3) # ratio of Dirac- and Astep-Structure to all edges # if divisor is 0, set per to 0 per = 0 if Nedge == 0 else Nda / Nedge # blur confident coefficient; how many Roof- and Gstep-Structure edges are # blurred; if divisor is 0, set blur_extent to 0 blur_extent = 0 if Nrg == 0 else Nbrg / Nrg # classify whether or not image is blurry result = blur_result(size, per, blur_extent) return image, result
I sincerely hope you found this post interesting; I tried to keep a good mix between technical detail and high-level overviews. If you have any tips, corrections, suggestions, or questions, feel free to reach out to me by email at firstname.lastname@example.org.
Next in the series: knest: part two (will update with link at publication)