Thursday, August 6, 2009

Activity 12 Color Image Segmentation

So far we have been doing image segmentation and morphology on binary and gray-scale images. In the past activities simple thresholding is enough to seperate our region of interest (ROI) but in some cases this wont be sufficient. In image processing there instances where binarizing an image or converting it into gray-scale wont separate it from the background, in such cases we need to utilize its color information. In this activity we use normalized chromaticity coordinates to do segmentation via color information.


Figure 1. Normalize Chromaticity Coordinates

The normalized chromaticity coordinates (NCC), shown in figure 1, can be thought of as a way of expressing 3 dimensional RGB information to a simpler 2 Dimensional r-g color space. The basis for this transform is that r+g+b=1 (done by normalizing RGB), hence the third value is redundant and it is enough to express color by only two values, in this case r and g.

After expressing an image in NCC we can now do the segmentation by using the probability distribution function (PDF) of the ROI in NCC. By mapping the image to the PDF of the ROI we are able to separate the ROI from the rest of the image. In this activity we calculate the PDF of the ROI using two methods, one of these is the parametric method which assumes that the PDF is a gaussian. In the parametric method we simply obtain the mean and standard deviation of the r and g of the ROI and use this to calculate its 2 dimensional PDF. The other method is the non-parametric method, here we do not assume any form of the PDF and we simply use the r and g (2 dimensional) histogram of the image and use this as the PDF. In both methods we obtain the PDF of the ROI by only taking a patch of it.

I used an image of multicolored candies to test these two methods for color image segmentation. Excluding the background, there are only three main colors in this image; red, orange, and green.

Figure 2 below shows the results of the segmentation using both parametric and non-parametric methods. For both methods separating the green region was accomplished almost perfectly. The pixelated result for the parametric method is caused by the actual texture of the candy and the parametric method wont be able to compensate for such a feature. On the otherhand the holes on the result of non-parametric method are most likely due to the patch I used to sample the ROI. This simply means that there are other colors which are part of the ROI that were not included in the sample patch. Using a better sampled patch would improve the result in this case.

Compared to the non-parametric the parametric method did not fare as well when segmenting the red or orange regions of the image. Since the parametric method assumes a gaussian PDF it is higly probable that the PDF of the red or orange ROI includes each other. Hence the parametric method fails when we try to separate colors that are very close to each other.

The non-parametric method does not suffer from the same problem as the parametric method since it uses the actual histogram of the ROI as the PDF for mapping. This means that even if the two colors are very close to each other there wont be any confusion as long as the other color does not appear frequently inside the ROI. As we can see from the histograms the orange and red are very close to each other but their histograms are almost mutually exclusive.




Figure 2. Original unsegmented image with the results of both parametric and non-parametric color segmentation. The patch of the ROI is also included as well as its histogram.

Overall, by looking at the results, I have to say that the non-parametric method is a much better color segmentation technique. Compared to the parametric method, the non-parametric is much more versatile, accurate, and the quality is also better.

I give myself a grade of 10 in this activity.

Activity 11 Color Image Processing

We have learned in our Applied Physics 187 course that an image captured by a color digital camera is composed of an array of pixels each having R, G, and B information. These three color information are each a product of the spectral profile of the light source, objects reflectivity and the cameras spectral reflectivity for R, G, and B. To obtain the actual appearance of the object we must divide these R, G, and B values with a scaling/balancing factor equivalent to the R, G, and B values of purely white imaged captured by the camera. This scaling factor is just the product of the cameras sensitivity and the illuminator's spectral profile. If the wrong scaling factor is used or it is miscalculated then the resulting image would not appear as good. The process of applying these scaling factors to a digital image is called White Balancing. In this activity we try out the White Patch and Gray World algorithms in doing White Balancing.

To test the effectiveness of these algorithms; we take pictures of objects containing the major hues which are obviously not properly balanced. We do so by not using the "auto white balance" function of the camera and just selecting the different balancing conditions. In the camera we used these conditions are called "daylight", "cloudy", "fluorescent", and "incandescent".

The White Patch algorithm makes use of information "obviously white" regions in the image. That is we calculate the balancing factor by taking the average R, G, and B values within a white patch of the image. On the other hand the Gray World algorithm assumes that the average color of the world is gray. Since the ratio of the R, G, and B values for gray is the same as white or black, this method takes the RGB of gray and uses it for White Balancing. In the Gray World algorithm the RGB values of gray is taken as the average R, G, and B of the whole image (whole world).

We must remember though that these White Balancing Methods wont work if the captured image is saturated, even for just a single color channel. A saturated image actually contains no color information and would not be useful in calculating the balancing factor.

Figure 1 below shows samples of captured images and their corresponding RGB channels. The blacked out regions on the RGB channels indicate the areas of saturation and wont be useful for our purposes. Therefore, in selecting the white patch we create a composite image (figure 1 last column) using the RGB channels that blacks out all the areas that are not useful. For the Gray World algorithm we simply take out the saturated regions in calculating the white balancing factors.


Figure 1. Sample captured images with their RGB channels and a composite image
indicating (blacked out) the areas that saturated.

Taking into account the saturated regions the results of applying the White Patch and Gray world algorithms on an image containing the major hues (RGB) are shown in figure 2 . First let us note that the results for both White Patch and Gray world algorithms for the incandescent and fluorescent settings are obviously wrong. The resulting image takes on an extreme shade of blue (or red) which is clearly not the true color of the objects. This error is mainly due to the fact that the original images taken with these settings were mostly saturated on the blue channel. On the other hand the images taken with Daylight and Cloudy settings were seen to improve by using the White Patch algorithm. In the daylight setting the original image is slightly bluish which after processing took on more accurate colors of the object.s This was also observed in the cloudy setting which initially had a reddish or brownish that was cured after processing. As for the Gray world algorithm, it was only seen to improve the images taken with cloudy setting resulting to a final image that displayed white as "white" even better than the White Patch algorithm.

Figure 2. Original image taken with different white balancing settings and
their respective results after applying White Patch and Gray World.

Next, figure 3 shows the results of applying White Patch and Gray World on images that had a dominant color, which in this case is blue. Again it is observed that applying a white balance on the images taken with fluorescent and incandescent settings resulted in a dark image. But even though these processed images are darker they are stilled considered improvements on the original since the original pictures are actually very bluish are actually worse representations of the true color of the image. As for the results of the Daylight settings we first note that the original image actually seems to be properly white balanced. We also see very little difference in comparing the original with the result after the White patch algorithm. But looking at the results of the Gray world algorithm we see that the image lost its yellowish color after it was processed. It is though to determine which is the more correct result. Finally the image taken with the cloudy setting has a very yellowish color which is obviously not correct. After doing the White patch we see that the yellowish color is removed which is a significant improvement. But after doing the Gray world algorithm the final image completely lost its yellowish shade but also became darker.

Figure 3. Original image taken with different white balancing settings and their respective results after applying White Patch and Gray World. The images have blue as the dominant color

Overall the results suggest that in general the White Patch algorithm results in a much better white balancing. The results of the gray world algorithm is seen to be highly dependent on the dominant color of the image.

I thank ate Cherry for lending us her camera and Irene for taking the pictures. I give my self a grade of 10 in this activity.

Activity 10 Preprocessing Text

In our past few activities we learned various ways of manipulating images both in the frequency domain and the morphologically. This time we will use all that we have learned to extract handwriting and other texts from a scanned document.


Figure 1. (Left) Original document and (Right) the schematic
on how the tilt angle was calculated.

Looking at the given document we first notice that it is slightly rotated so our first task is to straighten the document. We do this by using the 'rotate' argument of the mogrify() function in Scilab. This is a function that edits or distorts an image in numerous ways, in this case we do rotation. But before we do the rotation, we must first determine the angle of tilt of the image. We measure this angle by creating a horizontal line across the lines of the document and we calculate the angle by simple trigonometry (figure 1). And doing this method the angle we calculated was 1.05 degrees. The results of the rotation are shown on figure 2.


Figure 2. The result of rectifying the whole image and also a selected area.

We also note that the horizontal lines of the image are actually not needed and that it interferes with our objective. To remove these lines we must first invert our image (white over black) and convert it to gray-scale then apply a filter or mask in inverse space. This is similar to what we did in activity 7 and since we already know that the FT of horizontal lines would be along the vertical we simply create a mask that removes this portion of the images FT. After we apply the mask we then binarize the image to remove the background. The mask and resulting images are shown on figure 3, Fourier transform of the image is also included.


Figure 3. (top) The inverted image with its Fourier transform and (below) to the right is the mask used to filter the lines. To the left is the resulting binarized image after filtering.

After removing the lines we now go on to extracting the handwritings. We only cut a select few regions in the image where we will extract the handwritings. Our first step is to binarize the selected regions using a threshold just like in the past activities. Then we did some morphological operations hoping that it would result to a clear, readable, and distinct letter characters separated from each other. Then we label each character using bwlabel(). Unfortunately this task is very very difficult to accomplish. The handwriting through out the document overlaps with unwanted structures. And since the handwriting also has a lower gray value compared to the unwanted structures separating them becomes even more difficult. Further more if we are not careful the morphological operations destroy the letters instead of making them more distinct. Figure 4 shows the results after morphology, here I used the skel() function of Scilab to reduce the characters to single pixel width. The results, however, show that the final product does not actually look like letters. Still it is also true that these are actually similar to the original handwriting and to some degree each character was separated from each other.



Figure 4. (Left) The slected area of the document with handwriting and (Right) the result by extracting the handwriting using different morphological operations.

Our last task is to find instances of the word description with in the document. We did this by taking a template of the word description and correlating it with the image. The correlation is done similar to how we did it in activity 5. We simply multiply the FT of the template and the conjugate of the FT of the image and then take the FT of this product. If the correlation is correct it would result to an image that have the highest value at the instances of the word description. Figure 5 shows the template and the result of locating the instances of the word description. By comparing the image with the result of the correlation we see that as expected the correlation peaks at the locations of the word description. This result means that our method for locating a template works even if the quality of the image is not ideal.


Figure 5. The binarized image and mask used to locate the instances of the word description. The result of the correlation indicating the instances is also shown at the bottom.

I give myself a grade of 9 in this activity.

Activity 9 Binary Operations

In image processing it is often more efficient if we are able to separate our region of interest (ROI) from the rest of the image. The simplest way of doing this is by binarizing the image using a threshold. But even in binarized images having multiple ROIs offers some problems in the analysis. Overlaps among ROIs or noise introduced by insufficient thresholding would hinder the analysis of the image. Such problems can be addressed using different morphological operators, such us those we explored in the previous activity. The opening operator can be used to separate ROIs that overlap while the closing operator is used to fill in holes within the ROI. In this activity we make us of different morphological operations in improving an image of multiple disks/circles for area estimation.

To do the opening operation we first erode the image then dilate it. This would have an effect of separating objects that were initially adjacent or slightly overlapping. On the other hand the closing operation first erodes the image then dilates its. The effect of this operation would be to fill in holes of solid objects.


Figure 1. Original gray-scale image of disks.

Our image is a gray-scale ensemble of circular disks with some that are overlapping (figure 1). In this activity we aim to estimate the area of a single disk by taking the mean and standard deviation of the measured areas. Our first step is to cut the image into smaller images to make it more manageable for Scilab. We cut the image in way such that the slices overlap, this increases our sampling and prevents loss of information. Next we binarize each slice to exclude the background and highlight the disks. Then we do opening and closing operations to separate overlapping disks and also remove the noise left after binarizing. The structuring element I used is also a circle but is much smaller compared to those in the image. Finally for each slice we label each white area using the function bwlabel() and then measure its area and store the information. The results for each step we did are shown in figure 2.


Figure 2. From left to right: a slice of the image, a binarized version of the slice,
image after morphological operations, and the gray-scale labeled slice.

After we have completed the area measurements we now express our results in the form of a histogram (figure 3 left). Looking at the histogram we see that there are area measurements that are extremely large or small. Intuitively we know that this are not correct measurements and are mainly due to disks that were not properly separated or those that were cut in when we sliced the image. These measurements are outliers and we can discard them when we calculate the mean and standard deviation of the area. Another histogram is shown on the right side of figure 3 excluding these outliers.


Figure 3. (Left) Histogram of all the measured areas and
(Right) histogram of the measured areas excluding the outliers.

We know calculate the mean and the standard deviation of the measured areas excluding the outliers. Doing so we obtain a value of 501.8585 for the mean and 70.24116 for the standard deviation. These values are well within the expected value since the disks from the raw image have a radius ranging from 12 to 14 pixels. The mean area is the best estimate we obtained for the area of the disks whereas the standard deviation provides the range at which we can consider the areas equal or the objects the same. That is the area of the disks is 501.8585 ± 70.24116.

I will give myself a 10 in this activity. It was difficult but I think the results I obtained were accurate enough.

Activity 8 Morphological Operations




Our previous activities dwelled on operations, enhancements, and other image processing techniques done in the Fourier space. This time our activity involves morphological operations to augment or distort binary images. In this activity we will be 'eroding' and 'dilating' binary images of a square, triangle, circle, hollow square, and a cross using different structuring elements. But before we do this operations in Scilab we will first do some manual, 'pen and paper', predictions.

A structuring element (SE) can be thought of as the distorting pattern used in morphological operations. The dilate operation results in an increase in the size of the dilated object whereas the erode operation decreases the its size. Both operations follow the shape of the structuring element. The result of a dilation can be thought of as the collection of points that the center of the SE can occupy such that the SE and the dilated object intersects. On the other hand the result of an erosion is the collection of points the center of the SE can occupy such that the whole SE is inside the object.

The structuring elements we use in this activity are a 4x4 square, 2x4 and 4x2 rectangles, and a cross 1pixel thick and 5 pixels long per arm. We apply the morphological operations to five shapes namely: 50x50 square, 50x30 triangle (base x height), radius 25 circle, 60x60 hollow square with 4 pixel thick arms, and a large cross 8 pixels thick and 50 pixels long per arm.

Figure 1 shows the result of the dilation of these shapes. My predictions for the square, circle, hollow square, and cross were all correct. I was able to predict both the shape and the actual size (pixel counts) perfectly. The square SE increased the sizes of these four shapes along the horizontal and vertical uniformly. Hence all retained their general shape except for the circle which became slightly squarish. As for the rectangle SE's they distorted the shapes more along their dominant axis. Finally, the cross SE caused the shapes to have a hole on the corners similar to the cross whereas the circle slightly had a greater increase along the horizontal and vertical.

I only made a mistake in predicting the result of the triangle. I got the proper shape right but the actual dimensions were off. I was confused on how the edges and the corners of the triangle would result. Compared to the results my predictions were slightly smaller.


Figure 1. Result of the dilation of five shapes (left column)
with four different structuring elements (top row).

Next, figure 2 shows the result when the shapes are eroded. Again my predictions were mostly correct but this time I was only wrong for the circle eroded with the cross SE. The results for erosion using the square and rectangle SE were simply the opposite of their dilation. The interesting result here is for the cross SE on the hollow square and cross. As I predicted only the corners of the hollow square can fit the cross SE, hence its edges would vanish upon erosion. As for the cross SE on cross it is interesting to see that near the center the resulting shape is thicker. This is because near the intersection of the cross there are more points where the cross SE can fit.


Figure 2. Result of the erosion of five shapes (left column)
with four different structuring elements (top row).

The last part of this activity involves exploring the thin and skel functions of Scilab. Figure 3 shows an image included in the SIP package which we chose to apply thin and skel on. Figure 4 shows the result of thin, just like in its description thin removes the edges of a figure until it becomes a skeleton of the original image. But the result seen in figure 4 is not actually a good skeleton of figure 3, which is part of the limitations of thin. Finally figure 5 shows the result of using the skel function on figure 3. skel is a better function that thin because (1) you can controll what type of algorithm is used and (2) both exterior and interior skeletons can be obtained. The output of skel is also in gray-scale which means we can controll the sensitivity of the resulting skeleton using im2bw() with variable threshold. Further more skel is an actual function designed to take the skeleton of an image not like thin.


Figure 3. Image of Birds (Escher)


Figure 4. Result of applying thin on figure 4.

Figure 5. Result of applying skel on figure 4, taking both
exterior and interior skeletons and using exact euclidian algorithm.

Information on the functions skel and thin is included in the help and documentation of Scilab.

I give my self a grade of 10 in this activity.