Math and science::INF ML AI

Harris corner detector

A Harris corner detector extracts corner features from images. It identifies windows of the image containing corners as those where a small shift in the window position in any direction causes large changes in the pixel intensities within the window. This works because:

windows that have large intensity shifts but only for one direction: these contain an edge.
windows that don't have large intensity shifts are flat regions of the image.
windows that have large intensities in two directions contain a corner.

How many directions that lead to large intensity changes is determined by inspecting the eigenvalues of the structure tensor of the image.

Algorithm

The proceedure for applying the Harris corner detector is as follows:

Convert the image to grayscale.
Optionally smooth the image with a Gaussian filter.
Compute the image gradients in the x and y directions, for example using the Sobel operator.
Calculate the Harris response.
Find points that have a high Harris response (above a threshold) and are local maxima in their neighborhood (e.g. within a 3x3 window). This is non-maximum suppression.

Sum of squared differences, in matrix form.

The Harris corner detector can be described as calculating the sum of squared differences in a window, expressing this operation as a matrix, and then inspecting the magnitude of the two eigenvalues of this matrix.

Definitions needed.

Let \( I \) be a 2D image, \( h \times w \). Denote \( I(i, j) \) as the pixel value at position \( (y, x) = (i, j) \) in the image. At each pixel location, approximate the two partial derivatives of the pixel intensity with respect to the \( x \) and \( y \) directions; it can be as simple as using the Sobel operator. Denote the two 2D arrays of image gradients as \( I_x \) and \( I_y \). For example, \( I_x(i, j) \) is the gradient in the \( x \) direction at pixel \( (y, x) = (i, j) \).

For a displacement by the vector \( (h_y, h_x) \), the difference in pixel intensity for a single pixel is is approximated as:

\[ f_{y,x}(i,j) = \begin{bmatrix}h_y I_y(i, j) & h_x I_x(i, j)\end{bmatrix} \]

The sum of squared differences in the window \( W \) is:

\[ \]

Now, let \( W \) be a 3x3 (for example) array containing indicies window of the image. For example, the first row of \( W \) could be \( (70, 20), (70, 21), (70, 22) \).

Structure tensor and the Harris response

TODO

Source

Chris Harris and Mike Stephens introduce the detector in their 1988 paper: "A Combined Corner and Edge Detector".