IMAGE CLASSIFICATION FOR JPEG COMPRESSION

We analyse storage problems of digital images in accordance with image quality and image compression efficiency. Storage problems are relevant for Cloud storage and file hosting services, online file storage providers, social networks, etc. In this paper, an approach is proposed to process a group of images with a JPEG algorithm that all the processed images satisfy the minimum threshold of quality with the automatic selection of the quality factor (QF). The experimental investigation reveals advantages of the compression efficiency of the proposed approach over the traditional JPEG algorithm. The proposed approach enables saving storage spaces while maintaining the desirable image quality.


INTRODUCTION
Nowadays, problems of digital data storage, processing, and information presentation are especially relevant.Image storage techniques analysed in the paper can be used for different host and Cloud services, online file storage providers, social networks, etc.In this paper, we investigate various digital images captured by digital cameras and an efficient storage of these images.We aim to design a more efficient (than the existing) approach to store images in JPEG format.For this purpose, we use the image classification considering to image properties in order to maintain the set quality of the images.
JPEG image compression standard and its basic principles were proposed many years ago [16], however, currently, the standard is widely used and remains the most popular algorithm for image compression.A quality factor (QF) is the main parameter influencing the image quality after JPEG compression, which determines the compression ratio.This parameter is an integer number between 0 and 100, used to parameterize a quantization matrix.The greater this number is, the less information is lost.The problem is that the QF value can influence the each image quality differently when the quality is assessed by Full-Reference measures [2,8].The paper [15] shows that when compressing different images by JPEG algorithm with the same compression factor, a different compression efficiency is obtained.In the paper [13] it was showed that the image quality after processing by JPEG algorithm depends on the image content.The image quality was assessed by the following measures: Compression Ratio (CR), Signal to Noise Ratio (SNR), Peak Signal to Noise Ratio (PSNR) and Mean Square Error (MSE) [11] and the Structural similarity (SSIM) index method [17].
One of the most popular approaches of image storage is based on the repetitive quality assessment of each compressed image and/or repetitive transcoding operations.Here, compression algorithm is applied several times for each image, the quality of the compressed images are assessed each time by the quality assessment measures, and the settings of compression are selected according to the obtained results [3].The papers [1,9,10] described optimal and near-optimal quality transcoding systems using predictive quality factor and scaling parameters where you need to compute measurements and/or transcoding operations repetitively.In such a way, storage space could be saved while maintaining high quality of images.However, these approaches are time-consuming.
We suggest to employ computational intelligence techniques with a view to predicting the image quality before usage of a compression algorithm.The classification-based approach for the image storage method was analysed in the paper [14].Here, the influence of the JPEG algorithm on the image quality was predicted using Linear Discriminant Analysis (LDA) [5] when classifying images into two classes.The first classes consist of images whose quality does not change significantly after compression (high-quality).The quality of the second class images changes significantly after compression (low-quality).When comparing to a conventional JPEG, such classification-based approach allows saving 15% of storage space, while maintaining the user predefined quality.However, classification into only two classes is not sufficient from the user's point of view.Moreover, the problem of image feature selection should be solved to improve classification quality.This paper presents an approach for image storage when the images are classified into three classes.
The remainder of this paper is organized as follows.The proposed classification-based image compression approach is described in Section 2. The results of the experimental investigation are presented in Section 3. Finally, conclusions are drawn in Section 4.

PROPOSED CLASSIFICATION-BASED COMPRESSION APPROACH
The paper proposes an approach to process a group of images by the JPEG algorithm so that all the processed images would satisfy the minimal quality threshold defined by a user, and the QF value would be selected automatically.The proposed approach allows predicting how the JPEG algorithm will affect the image quality.In the process of the proposed approach, a classification problem is solved in order to group images into three classes, depending on how strongly the JPEG algorithm will affect their quality.The pro-cess of the proposed JPEG image compression approach is illustrated in Figure 1.It should be noted that the training process is performed once, and the compression -on demand.
The quality threshold is calculated by using the quality assessment measures.For objectivity, we select two least correlated [3] quality measures: PSNR which analyses the difference between pixels and the SSIM index which estimates the total image changes.Quality thresholds are set at the intersection of these two measures (Figure 2).
To set classes for classifier training, we comply with the following rules: • the images, whose the quality after the JPEG algorithm has changed the least, are assigned to the first class (SSIM > q s1 and PSNR > q p1 ); • the images, whose quality has changed the most, are assigned to the second class (SSIM < q s2 and PSNR < q p2 ); • the image whose quality has changed in average, are assigned to the third class (q s2 < SSIM < q s1 and q p2 < PSNR < q p1 ).
It should be noted that not all the images fall into one of these classes (for example, SSIM < q s2 and PSNR > q p1 ).However, such three sets are selected in order to get a more accurate training of classifiers.

EXPERIMENTAL INVESTIGATION
The digital image database SUN2012 [18], which consists of 16 873 different images, is used.For the experimental investigation, the images with dimensions not smaller than 1024 × 768 pixels are selected (the total number of such images is 2 963).

Classifier creation
For an initial assignment of the images to classes, the images are processed by the JPEG algorithm.The measures of image quality are computed: the SSIM values range from 0.802 to 0.999, the PSNR values -from 25.7 to 57.3.450 images (150 images of each class) are selected to train a classifier.LDA is applied to image classification.The classification accuracy is evaluated using a 10 fold cross-validation.The numerical values of the quality thresholds q s1 , q s2 , q p1 , q p2 for training sets are determined experimentally in such a way that the classification accuracy would be obtained as high as possible.The options of the classifier are presented in Table 1.
It is obvious that image classification results depend on a set of the features, describing the images.Thus, the problem of image feature selection for a higher classification accuracy must be solved.In order to set numerical values of image features, various geometric transformations, region and image properties, texture analysis are commonly applied [4,6,7,12].
In this experimental investigation, the following 55 different image features are used: the means and standard deviations of pixel values, entropy, histograms (in RGB, HSV and YCbCr), the amount of bits per pixel, and the number of different regions of similar colour areas in the images, processed by global image thresholding (see Figure 3).
The correlation coefficients of 55 features are calculated, and 16 least correlated features are used for image classification, where the correlation coefficients is less than 0.7.After 10 fold cross-validation, the classification accuracy 0.76 is obtained, when the images are classified into three classes.The confusion matrix shows (Figure 4) that the first and second classes are not overlapped.It means that the classifier does not  and PSNR > q p1 SSIM < q s2 and PSNR < q p2 q s2 < SSIM < q s1 and q p2 < PSNR < q p1 Quality thresholds q s1 = 0.94, q p1 = 37, q s2 = 0.92, q p2 = 32

Comparison of the proposed approach and conventional JPEG algorithm
The proposed approach is compared with the conventional JPEG algorithm in order to estimate the storage space of the compressed images.Two cases are investigated: In the first case, it is desired to obtain highquality images after compression.For this purpose, we use the obtained quality thresholds (see Table 1) and the following quality requirements: SSIM > 0.94 and PSNR > 37.
In the second case, it is desired to obtain middle-quality images after compression.The following quality requirements are used: SSIM > 0.92 and PSNR > 32.
It is identified experimentally that in order to achieve the predefined quality threshold by using the conventional JPEG algorithm, it is necessary to apply QF = 95 in the first case, and QF = 85 in the second case.To achieve the predefined thresholds by using the proposed approach, the values of QF are defined depending on the image class.In the first case, QF = 50 for the first class images, QF = 95 for the second class images, and QF = 85 for the third class images.In the second case, QF = 40 for the first class images, QF = 85 for the second class images, and QF = 65 for the third class images (see Table 2).
The experiment is repeated 10 times for each case when using different sets of images.Each time the set contains 500 randomly selected images that are used for class prediction and storage space evaluation.The average amount of image storage space after the compression and the average number of images which do not satisfy the predefined quality requirements are presented in Table 3.Here, the average amount of saved space by the proposed approach is also given.The results show that, when applying the quality requirements and the QF values, presented in Table 2, about 26% of the storage space is saved by the proposed approach compared with the conventional JPEG algorithm.Moreover, only about 4% of images do not meet the predefined quality.It should be noted that the quality of these images is very close to the quality thresholds, however, the quality of these images does not differ from the desirable level significantly.The other values of the desired quality factors can be used (different from Table 2).Then the increase of the QF values for different classes will lead to the decrease of misclassification.However, in this case, the saving of the storage space will decrease compared with the conventional JPEG algorithm.

CONCLUSIONS
In this paper, the computational intelligencebased approach for storage of the compressed images has been proposed and investigated.Here, an image classification problem of the three classes has been solved to predict the JPEG effect on image quality.
The approach allows processing large image groups by the JPEG algorithm so that the QF value is selected for each image automatically, satisfying the desired image quality.For this purpose, classification by image features is applied, where the images are classified into three classes, taking into account objective quality of the images.The suitable features describing the images have been selected.The resulting classification accuracy is obtained equal to 0.76.Such accuracy is high enough for the case of three classes.
Moreover, the confusion matrix has shown that the images of the first (high-quality images) and second (low-quality images) classes do not confuse with each other.
In order to highlight a superiority of the proposed approach, the comparative experimental investigations have been carried out.The experiments have shown that the proposed approach enables to save about 26% of digital image storage space, while maintaining the desired image quality, compared with the conventional JPEG algorithm.The image quality is assessed by full-reference measures -PSNR and SSIM index.When using the proposed approach, the desired quality is maintained for about 96% of the images.
The proposed approach is designed to store various digital images, captured by digital cameras, and can be applied to Cloud storage and file hosting services, online file storage providers, social networks, etc.
In further investigations, it is purposeful to develop a wide-purpose method for image storage using computational intelligence techniques, which could be used for images of specific content and formats, e.g.medical images, GIS images, etc.

Fig. 1 .Fig. 2 .
Fig. 1.The process of the proposed approach for storage of image groups: a) the training process, b) the compression quality with low-quality images after compression.It is obvious that an image can fall into high-quality or middle-quality class, as well as the same situation can occur with middlequality and low-quality classes.It happens when the quality of the misclassified images is near thresholds.However, such class overlapping does not influence classification accuracy significantly.

Table 1 .
Options of the classifier

Table 2 .
The quality requirements and the QF values

Table 3 .
Comparison of the proposed approach and conventional JPEG algorithm (the average amount of 10 experiments)