Dennis Núñez

PhD (c) in AI and Neuroimaging. CEA / Inria / Université Paris-Saclay

Skin detection in real-time based on face skin color


In the last two decades extensive research have focused on skin detection in images. Skin detection means detecting image pixels and regions that contain skin-tone color. Most the research in this area have focused on detecting skin pixels and regions based on their color. Very few approaches attempt to also use texture information to classify skin pixels.

Detecting skin-colored pixels, although seems a straightforward easy task, has proven quite hallenging for many reasons. The appearance of skin in an image depends on the illumination conditions (illumination geometry and color) where the image was captured. We humans are very good at identifying object colors in a wide range of illuminations, this is called color constancy. Color constancy is a mystery of perception. Therefore, an important challenge in skin detection is to represent the color in a way that is invariant or at least insensitive to changes in illumination.

In this project, we propose an algorithm that take advantage of face color information in order to segment skin regions. Since face has important information about skin propieties such as light conditions, skin tone color and more, one of the main advantages of this method is the no need of a database and training phase due skin model is extracted from the face and constructed according the actual light and skin conditions of the person. Therefore, this method automatically adapt the skin model to the actual conditions of the enviroment.

Face detection

Object Detection using Haar feature-based cascade classifiers is an effective object detection method proposed by Paul Viola and Michael Jones in their paper, "Rapid Object Detection using a Boosted Cascade of Simple Features" in 2001. It is a machine learning based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images.

The speed with which features may be evaluated does not adequately compensate for their number, however. For example, in a standard 24x24 pixel sub-window, there are a total of M = 162,336 possible features, and it would be prohibitively expensive to evaluate them all when testing an image. Thus, the object detection framework employs a variant of the learning algorithm AdaBoost to both select the best features and to train classifiers that use them. This algorithm constructs a “strong” classifier as a linear combination of weighted simple “weak” classifiers.

Figure 1: Typical Viola-Jones flowchart

All human faces share some similar properties. These regularities may be matched using Haar Features. Each classifier is composed of Haar feature extractors (weak classifiers). Each Haar feature is the weighted sum of 2-D integrals of small rectangular areas attached to each other. The weights may take values ±1. Figure 2 shows examples of Haar features relative to the enclosing detection window. Black areas have a positive weight and white areas have a negative weight.

Figure 2: Haar features used for face detection

The cascade architecture is very efficient because the classifiers with the fewest features are placed at the beginning of the cascade, minimizing the total required computation. A cascade of gradually more complex classifiers achieves even better detection rates.

The evaluation of the strong classifiers generated by the learning process can be done quickly, but it isn’t fast enough to run in real-time. For this reason, the strong classifiers are arranged in a cascade in order of complexity, where each successive classifier is trained only on those selected samples which pass through the preceding classifiers. If at any stage in the cascade a classifier rejects the sub-window under inspection, no further processing is performed and continue on searching the next sub-window. The cascade therefore has the form of a degenerate tree.

Figure 3: Cascades using for face detection

YCrCb color space

YCbCr is an encoded non-linear RGB signal, commonly used by European television studios and for image compression work. As shown in fig. 2 color is represented by luma (which is luminance computed from nonlinear RGB) constructed as a weighted sum of RGB values. YCbCr is a commonly used color space in digital video domain. In this format, luminance information is stored as a single component (Y), and chrominance information is stored as two color-difference components (Cb and Cr). Cb represents the difference between the blue component and reference value. Cr represents the difference between the red component and a reference value. YCbCr is used in several papers for skin detection. YCbCr values can be obtained from RGB color space according to the next equation.

In contrast to RGB, the YCbCr color space is luma-independent, resulting in a better performance. Figure 4 shows the diferent distribution of some colors such as yellow, green, red and others in the YCrCr color space.

Figure 4: YCbCr color space

Experimental tests shows the efficiency of YCbCr color space for the segmentation and detection of skin color in color images. Figure 5 demostrates that YCbCr color space is better than the RGB color space to detect skin regions. Furthermore, YCbCr image shows a better robustness to variation of light conditions, so conversion from RGB to YCbCr will be used in this project in pre-procesing step.

Figure 5: [Left] Original image extracted from the camera, [Center] Skin color thresholding using RGB color space, [Right] Skin color thresholding using YCrCb color space


Figure 6 shows the proposed system, which has four stages: face detection is performed in the first step, then the image transformation from the RGB to YCbCr color space is applied before the extraction of the parameters and before the skin color thresholding. Next the threshold calculation is done using the Y, Cb and Cr histograms analysis. Filter stage or classifier is obtained using the skin detection with the threshold that was obtained in the previous stage. In addition, Viola-Jones algorithm is used in face detection step. Since Viola-Jones algorithm, conversion to YCbCr and skin color thresholding are propcessed in about 50 ms, the real-time implementation can be poerformed without problem and using C++ language to reduce time processing.

Figure 6: Basic scheme of the skin detector

Since face features has important features of light conditions and skin color, this is used in the YCbCy color space to extract the most important features for skin detection. Because face size changes with the distance of the person in fron to the camer, the face is scaled to a 100x100 image, then the features are extracted. The parameters extracted from face are the histogram intervals in the YCbCr color space where the skin regions are located. This means that six parameters are extragted: Y_min and Y_max are the boundaries of luminance, and Cb_min Cb_max and Cr_min Cr_max are the boundaries of chrominance. These limits determine the histogram intervals of the skin regions. These interbals are shown in the Figure 7.

Then these six parameters (Y_min - Y_max, Cb_min - Cb_max and Cr_min - Cr_max) are used to decide whether a pixel belongs to the skin or not. In the Figure 7, pixels with YCrCb values inside orange region are labeled as pixels and pixels outside orange region are labeled as non skin. The min_val is used to discard regions like eyes, noise holes, eyebrows and small non skin regions inside the face in order to improve the skin model. As final step, a post-processing techique is used in order to smooth the skin regions. So, we used a Gaussian blur filter with a kernel of 5x5 pixels. The implementation of the whole system was performed completely in C++ and using OpenCV libraries.

Figure 7: Face region in YCbCr color space and its histogram intervals where skin is located


In order to test the performance of our skin recognition system, a group of images downloaded randomly from Google for human skin detection research. These images are captured with a range of different cameras using different colour enhancement and under different illuminations. These results are shown in the Figure 8. As the figure shows, the skin detection of the proposed method achieve good results despite differents orientations of the face, different light conditions, and different skin tones.

Figure 8: Skin detection for single images

Since the computational time to process face detection and find the skin intervals of the histogram is low, the real-time implementation of this skin classifier is feastible. The video shows below demostrates the good classification of most of the skin regions, but other skin regions are not labeled as skin due the low light conditions. Also, the video shows the whole image of the camera in the YCbCr color space, the face region in the YCbCr color space and its histogram in real-time.

On the other hand, this method detects skin only in directly human-machine interaction, it means that the person should be in front of the camera and face should be visible, otherwise it would be impossible to extract facial features. Despite these restrinctions, the video shows a good classification of skin and another regions of the body can be extracted can be extracted such as the hand, the arm, or other limbs.

Skin detector in real-time running on a laptop machine


A method for skin detection was proposed in this project, what makes a different approach compared with others methods is the use of the face to find skin regions. Since face contains useful and important information about the skin color and the light condition of the enviroment it is used to create an adaptive model of the skin accorfing to the actual conditions of the skin and the enviroment. Furthermore, this method can be imporved adding a better post-processing techniques such as morphology transformations or median blur filter. Some possible applications are hand detection to detect hand poses, human-machine interaction such as robots or PC machines, behavior of the drivers body in order to avoid accidents by falling asleep while driving, body behavior to survillance, and more.