Nonlinear Nonlocal Algorithm for Video Filtering

Video sequences are frequently contaminated by noise throughout the acquisition process, resulting in considerable degradation of video display quality. In this paper, we present a novel method of video fi ltering. The proposed fi lter is developed from an optimization problem in which a Bayesian term and a noisy video sequence prior distribution are combined. The method begins by segmenting the video sequence into space-time blocks and then substituting each noisy block by a weighted average of non-local neighbor blocks. Gradient-based weights are used to dynamically adjust the edge preservation and smoothness of the reference block. The obtained formulation enables nonlinear fi ltering and, hence, preserving key features such as edges and corners while using the intrinsic Bayesian fi ltering framework. Experiments on diff erent video sequences with varying degrees of noise show that the proposed method performs better than state-of-the-art video fi ltering approaches.


INTRODUCTION
Throughout the acquisition process, video sequences are often contaminated by noise, degrading the picture and video display quality signifi cantly and adversely aff ecting post-processing performance in such as motion tracking, object detection, feature extraction, and pattern recognition. Numerous noise reduction techniques have been published in recent years, and the majority of them can be classifi ed into two broad categories based on whether the noise reduction process by exploiting spatial as well as temporal correlations between the frames [1], or by employing nonlinear diff usion techniques to boost edges while the noise is reduced [2]. In what follows, we highlight the most important methods from each category.
From the fi rst category, Yan et al. [3] propose a temporal fi ltering technique based on extracting and modeling noise in continuous frames of video sequences. The fi lter was shown to be eff ective in reducing noise in real-time video sequences, but suff ers from dragging eff ects on moving objects. Another fi lter based on temporal data blocks is presented in [4], the fi lter is effi cient at reducing noise while minimizing blocking artifacts, however, it does not prevent the formation of blurred edges. Due to the fact that both spatial and temporal fi lters may produce blurring in motion regions independently, the concept of considering both space and time information in order to avoid temporal artifacts is raised. For example, [5] presents a method for reducing video noise based on the space-time correlations between adjacent frames; to get an appropriate results, Kalman and bilateral fi ltering were integrated into the fi lter. Another method, based on spatial-temporal combination, is presented in [6]. It discriminates between static regions and the motion of video sequences' regions. The authors of [7] proposed a technique for minimizing noise in videos that have been affected by random and fi xed-pattern noise by using motion-compensated 3D space-time volumes.

244
To enhance fi ltering, especially in the large-scale TV noise region, [27] developed an adaptive fi lter that adapts temporal and spatial characteristics in real time. The authors of [28] suggest a recursive noise reduction fi lter for video based on nonlocal means that allows fi ltering for temporary correlations.
The second category of video sequence fi lters uses nonlinear diff usion techniques to increase the sharpness of edges while reducing noise [11,12]. To guarantee numerical stability, this kind of fi lter discretizes partial diff erential equations using a fi nite diff erence together with a semi-implicit temporal discretization schema. In this setting, an improvement to the nonlinear smoothing method for minimizing video noise and enabling a more effi cient compression operation is proposed [14]. In [11] and [12], the space-time diff usion tensor is used to improve each video frame in the intensity change principal direction. While these nonlinear techniques provide pleasing visual eff ects and increase the quantity of video sequences [15,16], they suff er from blurring critical video sequence characteristics such as edges and corners. To overcome this limitation, [17] implements an isotropic diff usion equation using a fi nite diff erence technique to improve video sequence patterns in terms of optical fl ow computation. Similarly, [18] presents an anisotropic diff usion technique for real-time video noise reduction; this method replaces the Gaussian diff usion kernel with a median mean value in order to improve the image's quality while decreasing the fi lter's processing time through GPU optimization.
To our understanding, the best video fi ltering results are currently achieved using nonlocal Bayesian patch methods. In this context, [7] and [10] present a Bayesian 3D patch video fi lter based on space-time rectangular patches. Each patch, is fi ltered using the maximum a posterior estimator, which is considered as a sample of the Gaussian distribution. Another rapidly growing family of video fi ltering techniques, such as DnCNN, is based on convolutional neural networks (CNN) [19,20]. Recently, nonlocal network for noise reduction in video sequences was suggested [16]. The network is constructed by combining CNN with a method known as self-similarity searchm. The objective is to determine the most comparable patches for each patch through its initial non-trainable layer, and then utilize this information to forecast the clean video sequence using CNN.

CONTRIBUTION
Although the non-local patch-based methods are state-of-the-art video fi ltering, they have the drawback of blurring important video sequence features. To address this though non-local patchbased techniques are state-of-the-art methods for video fi ltering, they have the disadvantage of blurring critical video sequence features. To address this drawback, we propose solving an optimization problem that incorporates both a posterior Bayesian term and prior probability distribution of the observed noisy video sequence. The derived noise reduction formula then incorporates a nonlinear diff usion component into the Bayesian expression, increasing fi ltering on the homogeneous region and decreasing fi ltering on the video frame's local properties. This kind of fi ltering preserves video sequence's local features. To take advantage of the noise redundancy, we developed a Bayesian nonlinear fi lter on a block-wise nonlocal space-time region. This allows the usage of redundant information throughout the video sequence while still preserving important features.

Bayesian Formulation
The observed noisy video sequence is usually represented as a patch additive form: is the observed patch corrupted by the noise term η modeled as an independent and identically distributed Gaussian distribution.
Bayesian methods are intensively used in the image fi ltering literature. Such approaches introduce prior knowledge and impose limitations in the estimation process. The Bayesian estimator, known in practice as the maximum a posterior (MAP) approach, is described as the maximum of the following posterior distribution: The Bayesian approach fi nds the patch u that maximizes the posterior probability given the ob- . By Bayes rule, we have: is the observed noisy patch.
As the logarithm is a monotone function and the probability function P is positive, an equivalent formulation of (2) using Bayes rule (3) leads to: Notice that the probability term P(̃= u + η ) is omitted from the previous equation since it is independent to u. The above optimization problem is composed of the prior patch distribution P(u) and the posterior Bayesian patch distribution P(̃= u + η In what follows, we explicitly defi ne each term in the optimization problem (4).

The prior patch distribution
In the case of linear model (1) with additive Gaussian noise, the patch prior P(u) of u is also a Gaussian distribution with covariance matrix and expectation patch ̃= u + η , which means that: where: A 0 is a coeffi cient of normalization.
In the next section we calculate the second component of (4) based on the posterior distributions.

The posterior Bayesian patch distribution
To compute the posterior Bayesian patch distribution, consider the video sequence as a dynamical diff usion framework of particles in a space-time domain, with each particle representing a pixel in the video sequence [22]. Then, according to stochastic diff usion theory [23,24], the specifi c position of the particles at any given time period cannot be calculated precisely; instead, only the probability of the particle being located in a particular region area can be estimated. Specifi cally, the transfer probability density of the particle has the following form: (6) where: D is the diff usion diagonal matrix, and A 1 is a positive normalization coeffi cient.

Nonlinear Bayesian lter formulation
By substituting (5) and (6) in the optimization problem (4), we get the following equivalence of optimization problem argmin for each observed patch u: Due to the fact that the noisy patch ̃= u + η The problem (7) may be reduced to a minimization problem by using the formula (8)  1 + 0 + 1 + 2 (9) By rearranging the terms, we get: (10) as an optimal solution to the problem (7).

Nonlocal Nonlinear Noise Reduction
Patch-based video models may be thought of as three-dimensional (space-time) expansions of the conventional two-dimensional image block matching model. By including temporal information, we may take use of redundancy in motion information. As with the image non-local Patch-Based algorithm [24,25,26], the proposed implementation consists of three-stages: a) identifying and grouping in a 4D block space-time volumetric patches that are similar to a reference patch; b) using a collaborative fi lter; and c) aggregating the collaborative fi lter, which is implemented in two steps: 1) Bayes' formula is applied to the 4D block, and 2) the 4D block is relocated. This 3D fi ltering is performed simultaneously on a group of 4D video blocks. Due to the overlap of the fi ltered patches, multiple estimates for each pixel need to be combined. Aggregation is a special process of averaging that makes use of this redundancy.

Space-time patches grouping
We begin by using the patch grouping method extensively used in the image processing literature for video fi ltering. Consider a noisy video sequence that has been evenly discretized along the time axis into a set of images.
With reference to Figure 1, assume that we want to fi lter the frame of the image sequence using the previous frame and the next frame , and let be the reference observed noisy patch to be fi ltered with size (seen as a column vector) of the noisy image .
Within a centred windows with size , we select the similar patch to the reference patch as Where α is a threshold parameter and is the set formed by the N closest similar patches to the reference patch . The extension to the space-time patch grouping volume is performed as follows: for each similar patch, , we calculate the displacement vectors that identify the patch in the preceding and subsequent frames. The sequence is then explicitly fi ltered along the motion trajectories using motion estimation as a pre-processing step. As shown in Figure 1, we refer to the spacetime volume patch that is obtained by grouping all similar neighbor patches. The reference patch and all its similar patches with (10) are then restored using the covariance estimation (8) and the expectation block: Algorithms 1 describes the proposed video denoising technique.

Nonlinear lter parameters: local features preserving
To begin, we estimate the diagonal diff usion matrix D = diag(,···,) for the whole sequence, which enables to perform nonlinear diff usion, where n is the size of the frame Each diagonal term of the diagonal diff usion matrix is given by 1 + 0 + 1 + 2 (13) where: and are the eigenvalues of the space-time structure tensor which is defi ned at each point by: (14) where: stands for the spacetime gradient operator. For space-time homogeneous structure, all the eigenvalues are small ≈ 0, in this case, d ≈ 1 and therefore, space-time isotropic fi ltering takes place. The situation in which moving a corner or edge structure corresponds to , characterizes a shifting pixel on an edge or corner structure along the trajectory of motion. In this instance, the nonlinear fi ltering is performed, and hence the edges and corners are preserved throughout the fi ltering process.
As described in [16,25,26,27] and [28], we reduce noise via two fi ltering stages. To begin, we group the space-time neighborhood of each noisy patch in line 3 using the patch distance (11) as illustrated in Figure 1. Following that, we use the equation (12) to estimate the mean and variance patches. By using the Bayes role stated in equation (10), we get a basic estimation and of the reference patch and all its similar patches in line 5. In the second stage, we calculate the fi nal estimation patch and using the basic estimate as an oracle. That is, we construct the space-time volume by grouping all similar neighbor patches; the mean and covariance matrices are computed in line 6, and the fi nal noise-free patch is estimated in line 7 using patches from the basic estimates and .

EXPERIMENTAL RESULTS
In this section, the proposed non-linear nonlocal Bayes video fi ltering algorithm (NNBA) is evaluated on grayscale and color sequences. The fi rst application compares the fi ltering outputs of three state-of-the-art methods using real grayscale video sequences that have been contaminated with artifi cial Gaussian noise. We calculate the PSNR (the Peak Signal to Noise Ratio) as an objective performance evaluation to quantify the visual diff erence between the processed and original video sequences. We consider the qualitative evaluation via visualization of the fi ltered video sequences. The standard video sequences such as "Tennis", "Flower Garden, "Salesman", and "Miss America" are used for the evaluation of the proposed fi lter. The sequences "Salesmam" and "Miss America" are characterized by fast variation and slow variation objects on stationary backgrounds. The "Tennis" sequence has fi ne textured regions, rotation motions, zooming, and panning, while the Flower Garde" contains a wide textured region and steady translation motion. We evaluate our fi lter on two color video sequences: the "Flower Garde" sequence with additive Gaussian noise, and the lizard and TV sequences that are naturally contaminated with noise.

Quantitative evaluation
To begin fi ltering the noisy video sequences, we set the search window's dimension, the patch size , and the dimension of each pixel's spacetime neighborhood N. To simplify the parameter selection procedure, we pick the set of parameters that provides the fi lter's highest mean PSNR when applied to noisy video sequences. Numerous tests have been conducted to fi ne-tune the suggested parameters, which have a major impact on the fi ltering effi ciency as a whole. We utilize a single set of settings in these experiments: the values of the nearest comparable patches are set to N = 21, the size of the search window is set to = 21, and the size of the reference patch is set to = 9. As a consequence, the fi ltering results are enhanced. The motion fi eld for the current video frame is estimated using the block matching method with a block size of 16 and a searching parameter of 7. The Gaussian convolution parameter is set at in equation (14). That is, for each frame, we calculate the derivative gradient for each pixel inside a fi ve-dimensional window.
To process each reference patch in the current frame, the fi ltering process involves both the previous and subsequent frames along the estimated motion trajectory, and at each patch center, a space-time neighborhood of dimension 21 is evaluated, as is shown in Figure 1. The proposed fi lter was compared to three known fi lters in the literature: the video block-matching fi lter (VB-M4D) [29], the video network (DnCNN) [19], and the motion-adaptive space-time fi lter built on the K−closest neighborhood (3DKNN) [30].
On conduct this comparison, we applied the proposed fi lter to noisy sequences and compared it to the other sequences fi ltered using the three comparative techniques. We examined the sequences accessible in the public domain to off er an impartial comparative evaluation.
In the fi rst set of experiments, we introduced a modest quantity of artifi cial Gaussian noise with variances of σ = 15 and σ = 20 to the original video sequence, averaging PSNR. By providing the maximum PSNR for all evaluated sequences, the proposed fi lter outperforms the majority of comparable fi lters; we observe that the network DnCNN overfi ts motion patterns throughout training and fails when confronted with a different motion. In general, the proposed filter performs optimally in terms of PSNR for almost all test video sequences. As shown in Table 1, the results are displayed in terms of the averaged PSNR. The proposed filter surpasses most other comparable filters by providing the maximum PSNR for all evaluated sequences. We notice that the network DnCNN over fit motion patterns in the training and fails when it encounters a different motion. In Figure 8, the graph depicts each frame's PSNR for the four processed sequences.
An expended view of the filtered frame is the addition of a wide textured region and steady translation motion. We evaluate our filter on two color video sequences: the" Flower Garden" sequence with additive Gaussian noise and the lizard and television sequences with naturally occurring noise.

Quantitative evaluation
Along with quantitative evaluation, we evaluate the visual quality of the proposed filter's output to that of the three filtering techniques. Since all filters are particularly effective at reducing low levels of noise, we purposefully compared the results for high-noise sequences with noise level σ = 20, and as illustrated in Figure 2, Figure 3, and Figure 4, the proposed filter results in a remarkable reduction of noise with less temporal and spatial blurring than other filters.
We use the proposed algorithm for filtering two color video sequences, one with natural noise and one with artificial Gaussian noise. In Figure 6, we illustrate a video sequence with artificial noise; as with the grey-level video sequences case, the visual evaluation shows that the proposed approach achieves significant results in terms of artificial noise removal. In Figure 7, we display a video sequence with natural noise, which was processed by TV cameras. The camera introduced the noise. The results of noise reduction on the Lizard sequence are displayed, and on the filtered frames, we can see that the proposed filter can effectively reduce noise while preserving essential frame features, as shown in the expanded view of the image. Figure 9 illustrates the results of noise reduction on a television video sequence, demonstrating that noise reduction is successful.

CONCLUSION
This paper proposes a new nonlinear Bayesian framework for noise reduction in video sequences. The proposed filter is developed from a variation problem that integrates a Bayesian term with a prior probability distribution of the observed noisy video sequence. The resultant formulation includes a nonlinear diffusion component that preserves critical features such as edges and corners. To exploit the noise redundancy, we constructed the Bayesian nonlinear filter on a block-wise non-local space-time region, which enables us to use intrinsic Bayesian filtering.