A robust FLIR target detection employing an auto-convergent pulse coupled neural network

,


Introduction
Automatic target detection (ATD) is able to identify targets automatically based on the obtained data from different sensor such as RAdio Detection And Ranging (RADAR), Synthetic-Aperture Radar (SAR), Forward-Looking Infrared (FLIR), etc Bhanu (1986). These sensors are mostly set very far from the original object hence there is a high possibility of affecting the data by massive clutters of the environmental hazard. Therefore, the data produced by these sensors are tremendously rigorous and hard to find out the target by human expert in a real-time manner. The ATD system using FLIR is able to locate and identify suspected objects at night, where infrared wavelength plays a pivotal role though identifying small targets from it is a challenging task Gao, Zhang, and Li (2012); Shi et al. (2018). FLIR sensor can see through in various atmospheric conditions more specifically, it captures the heat signature from an object, which even makes easier to detect camouflaged object. Thus, this system widely used in military field as most of surveillance and targeting occurs at darkness or low lighting environments.
The rapid expansion of the practical applications, traditional methods of ATD for small target detection like top hat Bai and Zhou (2010), max-mean or median filter Deshpande et al. (1999) and morphological filter Shaoa, Zhua, and Liub (2008) are vastly used to deal with background clutter. In addition, a small target detection technique based on sparse ring representation (SRR) is offered in Gao, Zhang, and Li (2012). SRR can define the difference between the targets and background through its active graphical structure. Meanwhile, the feature based approaches such as, Zernike moment, multi-feature, fusion based physical relevant feature, etc. are focused on invariance properties, robustness, and powerful discriminatory of the target feature for ATD. These are mostly applied to SAR imagery Clemente et al. (2015); Kim, Song, and Kim (2018). So far, numerous methods based on neural network Cheng, Zhou, and Han (2016), and manifold learning Li et al. (2010) are applied for target detection. However, small target detection from cluttered background is still very burdensome. In recent times, bio-inspired processes like the hierarchical model Serre et al. (2007) and derived kernel Li et al. (2012) are getting more attention. Visual cortex model (VCM) is another physiological motivated information processing prototype that makes predictions and also acts like scientific eye. This helped to design a neural network named PCNN Subashini and Sahoo (2014), an effective simulation tool for learning synchronous dynamic pulse that can solve the above-mentioned challenges in an adaptive way. The pivotal challenge of the PCNN is to decide the convergence criteria through experimental and/or manual procedure, which is problem specific till now and results huge time and space complexity burden. In this paper, a robust and adaptive ATD is proposed by employing improved PCNN to solve both of the purposes: target detection along with clutter rejection. This work is introducing an auto-convergence criteria to extend the segmentation potentiality of PCNN and create powerful ATD tool from low-contrast imagery in an adaptive way. Subsequently, proposed algorithm is validated through statistical measurements and obtained significant detection performance over three other techniques.

Proposed methodology
Here, PCNN model has been modified to propose an automatically converged pulse coupled neural network (AC-PCNN) that provides a robust ATD tool. The flow chart of the proposed work is shown in Figure 1. The proposed ATD process has been accomplished by following these consecutive stages: data processing from the FLIR image sequence, adaptive segmentation through PCNN, ROI selection, and template matching.

Auto convergent -pulse coupled neural network
In Eckhorn et al. (1990), Eckhorn et al. first developed a neural model on cat visual cortex that produce binary images from the visual impression. Then, this network had been modified and developed to create an image by effective simulation of a synchronous behaviour Subashini and Sahoo (2014). PCNN is a third generation, single layer, two-dimensional, and laterally connected neural network. PCNN is consists of three parts: input, modulation, and pulse generator. In the input part, each neuron receives signals through feeding (F i,j ) and linking (L i,j ) channel, where i, j stands for location of the neurons of the image in n th iteration. Feeding consists of external sources (S i,j ) which is the normalized pixel value of input image. Linking is being represented by the constant synaptic weights from neuron (w i,j,k,l ) and linked with its specified neighbour neurons refer as N i,j , where k, l indicate the location of the neighbour neuron of i, j. The modulation part, is the nonlinear combination of both feeding and linking signals through the linking coefficient β in internal activity (U i,j ). Finally, the pulse generator part produces pulse to fire neurons using an adaptive threshold variable (T i,j ) as a step function to control the firing event. After thresholding it produces the pulse output O i,j based on this mathematical models shown in Eqs.
where, α T is the threshold decay time constant and V T is the threshold normalization constant. By handling large number of PCNN parameters, an auto convergence criterion has been introduced here to reduce the computational overhead and make simulation simpler. Target chip size has been utilized as a prior knowledge to make the convergence automatic. After each iteration, the output image of the previous iteration has been compared with the target chip to decide the cardinality of the target using the relationship, r i=1 c j=1 O i,j [n] >target size(m × n), where, r and c are the number of rows and columns of the output image. It works in a way when, the threshold decreases exponentially by it causes more firing in next step. For instance, if any iteration fires more neurons than the size of target chip then network has been converged and previous iteration has been considered as the final outcomes.

Region of interest and template matching
This final iteration produces an image with bright intensity regions which may contain the target called region of interest (ROI). ROI has been then cropped and target template has been moved through these ROIs to identify the target location more specifically. Template matching has been done to find out best similarity within the ROI upon which the template is placed, using Euclidean similarity measure by following the formula in Eq. (6), where, d represents the similarity between ROI and template, R denotes ROI, T m indicates the selected image template, u and v are the coordinates of R and T m . T r and T c are the number of rows and columns of the T m . Here, the dimension mismatch between ROI and template may occur, specifically where the template size is slightly larger than ROI. To avoid this circumstance three times larger than the target chip has been selected as the size of ROI, shown in Figure

Experimental and result analysis
This section demonstrates the impact of the proposed AC-PCNN method which has been later compared with other existing methods by performing statistical measurements such as precision, recall, accuracy and receiver operating characteristic (ROC) curve Davis and Goadrich (2006). The study has been tested on a large number of FLIR images and the visual outcomes have been analysed through template matching on selected ROIs. This process has been executed for all the compared methods to evaluate the results and make fair judgement.

Data description
The FLIR dataset SENSIAC (2008) has been used for this investigation. It contains real-time videos and information about the army vehicles in ground scenario. The videos (later extracted as images) comprises situations like different weather condition, night time, engine on/off, with various aspect angles (0 • to 360 • ), far targets, nearby targets, and camouflaged targets. Here, the experiment has been conducted on the data collected through MWIR sensor. The army vehicles were driven within a specific range from the sensor during night for data collection, detailed in Table 1. The nine types of battle tank targets have been considered and the results shows here only for the battle tank T72 is marked bold. Forty night vision videos have been tested with two different scenes: high and low cluttered background based on the target range from sensors. Further, image frames have been extracted from these videos. Each video consists of 1024 frames that contains various viewpoint and different angles of a moving target. Twenty frames have been selected from them at a fixed interval to cover all possible views of a target. The original image size is 640 × 512 pixels with two types of target chip size: 10 × 5 pixels for high and 30×20 pixels for low cluttered background. This study has been conducted entirely on 800 distinct images. The experiment has been carried out using Matlab R2016b tool on an Intel(R) Core(TM) i5 processor@ 3.30GHz running Windows 7 Enterprize 64 bit operating system with a 7856 MB NVIDIA Graphics Processing Unit (GPU).

Parameter fitting
Proposed AC-PCNN algorithm includes some parameters that have been fixed by the method of successive estimation. The parameters are threshold decay constant (α T ), normalization constant (V T ), and linking strength (β) denoted in Section 2. The value of V T is set as 20 because it must be high enough to prevent the neurons from firing instantly. Experimentally the value of β and α T are being set to 3 and 0.2. This parameter selection phase makes the algorithm able to find out the correct position of a target with less computational complexity. Therefore, the quantitative analysis of parameters is considered to achieve optimum performance by the network for true target detection. The parameters of AC-PCNN have been tuned to deliver best results within the considered data. The selection of parameters make the model capable to achieve optimal results and is essential for the correctness of the proposed ATD.

Result analysis and comparison
AC-PCNN produces segmented images after each iteration based on the intensity level of the input images. The obtained results from the iterations of AC-PCNN have been shown in Figure 3. In first iteration all neurons are fired, as the variables are initialized with zero except F i,j . Then, the threshold normalization constant V T have increased the T i,j . Thus, this high threshold value restricts the immediate firing of neurons. Associated decay parameter α T reduced T i,j exponentially over the iterations that the minor changes in resultant binary images could be identified significantly. It has been observed that, the network takes approximately twenty-one iterations to fire the neuron again after first iteration as shown in Figure 3. It has been found that, the output of the 23rd iteration contains the brighter intensity values which became greater than the threshold and consequently neurons started firing. The resultant images from iteration 23 to 29 illustrate the increment of the target region along with firing of neuron. Over the iterations, T i,j has decayed and the 30th iteration shows more background portions. As a result, it produces insignificant boundary regions, which include the background or clutter. Therefore, the 29th iteration met the convergence criteria (described in Section 2.1) and has considered as final output. All frames have been processed in the same way to reach their converged iterations.  The other established detection methods such as, Otsu thresholding Otsu (1979), centre surround difference (CSD) Sun et al. (2005), saliency extraction Hou and Zhang (2007) and fuzzy energy based active contour (FEAC) Mondal, Ghosh, and Ghosh (2016) have been applied to the FLIR data to compare them with the results of PCNN. The performance comparison with AC-PCNN and CSD have been displayed in Figure 4. Here, two low and two high cluttered FLIR images have been chosen for presenting the difficulty of handling noisy input images and prove the robustness of the proposed method. Figure 4 demonstrates the results of low and high cluttered images. Figure 4a(i)-(iv) is the original input image and the associated spatial distributions have been displayed in Figure 4b(i)-(iv) which has reflected maximum of 40 intensity value approximately. The 3D spatial distribution with consequent colour bar indicates the different intensity level of thermal variabilities of target and background. Figure 4a(i) and 4a(iii) have the similar spatial characteristics in terms of homogeneous 3D perspective view. Although a sharp peak has been detected in both the cases, due to the head light of the tank though the detection of the shape of the whole target is the most challenging task of this detection process. In case of, Figure 4a(ii) and 4a(iv) both contain more visible regions than the other two. But, the thermal variability of these images makes it more imperceptible for target selection from the immediate background. The distribution edges of the object and background have been realized from Figure 4b(ii) where no clearly separate peak has been found for target, but the colour bar indicates the presence of the target by high intensity values (shown in yellow colour).
The obtained results from CSD have been shown in Figure 4c(i)-(iv) and corresponding 3D histograms are plotted in Figure 4d(i)-(iv). The peaks of the distribution indicate the tentative target region, which contains huge amount of false alarms. Progressive sub-sampling and low pass filtering have created the difference between fine and coarse scales of pixel towards the intensity feature map. However, due to the intensity overlap, it could not reveal contrasts between the target tanks and back bush properly which have the similar intensity and reflectivity. Often it missed the actual target position because of the prominent background, which might be occurred due to the thermal inconsistency of the input images. As a result, it has produced binary output images with target along with more number of false alarms per scene. Figure  4e(i)-(iv) shows AC-PCNN results which converged in 27th, 23rd, 20th, and 21st iteration respectively (from left to right). The intensity distribution have been shown in Figure 4f(i)-(iv), where the target regions are found accurately and realized from the peak. . Thus, it implies that the high difference between target and background intensity (in case of high cluttered images) insists the rate of firing neuron and converge the proposed method earlier. Successively, to acquire the true shape of target, ROI has been cropped and template matching has been being implemented using the target chip in an overlapping manner. The lowest distance between ROI and target chip has been considered as best matching region. Then the final results have been shown in Figure 4g(i)-(iv) by using a bounding box on the target region. In addition, histogram equalization have been performed to enhance the visibility of the output images which are the final results of the proposed ATD.

Performance evaluation
Following baseline methods: Otsu thresholding Otsu (1979), CSD Sun et al. (2005), saliency extraction Hou and Zhang (2007), and fuzzy energy based active contour (FEAC) Mondal, Ghosh, and Ghosh (2016) have been explored to compare and assess the results of the proposed method. Furthermore, ROI and template matching have been included for all baseline methods in final ATD process for justified evaluation. Experimental results have been validated through confusion matrix, precision, recall, and accuracy measurement. Confusion matrix has been determined by the true positive (TP), false positive (FP), false negative (FN), and true negative (TN) measures. Potential targets have been categorized in two cases, actual target (shown in column) and predicted target (shown in row).
The results of proposed method have been compared with different methods and tabulated in Table 2, where it can be seen that the CSD is able to detect 421 targets accurately among 800 targets. CSD performs the adaptive image thresholding by using the Gaussian pyramid as progressive low-pass filter and sub-sampling of an input image. The number of false alarms obtained in each scene is quite high thus, the detection rate is very low compared to other ATD method. In the case of Otsu thresholding and saliency extraction, both detected 434 number of true targets but the type I and type II errors are different because, they use different approach for detecting targets. Otsu uses the thresholding for segmentation whereas, saliency employs feature mapping to detect targets. The selection of threshold in Otsu is solely depends on the nature of the pixel intensity. So, if the global distribution of target image and background varies widely the target detection performance degrades automatically. However, in case of saliency, it employs feature mapping at a given scale by using feature vectors for each pixel, instead of using separate saliency maps for scalar values of each feature. But it fails to detect the targets from FLIR imagery due to the low contrast and thermal variability between the target and background. Also, FEAC is an active contour based method has been implemented for comparing the result of proposed work. This method is focused on fuzzy energy based active contour by considering local spatial information. It is capable to deal with noisy, blurred, discontinuous edged, and high intensity inhomogeneity images. However, it struggled extensively to segment small size targets from low-contrast and high cluttered FLIR images. FEAC has detected 484 true targets from the given FLIR images. Thus, this method is also failed to compete with the obtained performance from proposed work in terms of ATD. Therefore, Table 2 infers that the proposed method has detected highest number of accurate targets 718 out of 800 targets, which is the acquired optimal performance from this experiment. The accuracy has been measured for all the methods by measuring total truly detected and not detected targets among the complete experimental dataset. The detection (a) rate and the probability of accuracy have been calculated via precision and recall measurements, tabulated in Table 3 and the values have been plotted in a bar graph, shown in Figure 5(a). It has been observered from the results that the proposed method achieved high precision, recall, accuracy, and area under the curve (AUC) values (mark in bold in Table 3) compared to the existing methods. In addition, ROC curve has been displayed in Figure 5(b) to demonstrate the performance of proposed method in a graphical way. It infers that, the overall performance of our proposed method is superior than the other baseline methods explored in this work.

Conclusions and future work
Proposed method demonstrates the ability of the AC-PCNN to segment the highly cluttered FLIR images by an efficient adaptive thresholding method. In addition, this work contributes to convert the PCNN to an auto-convergent PCNN (AC-PCNN) towards ATD process. The combination of the ROI and template matching for this work gives a good approximation of the accurate target shape. The promising results have been obtained from the proposed work compared to the existing methods. Although, this work is limited to identification of a target image and underway to extend for target recognition towards making a full automatic target recognition (ATR) system.