logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-02212008-223613


Tipo di tesi
Tesi di dottorato di ricerca
Autore
MASINI, ANDREA
Indirizzo email
andrea.masini@iet.unipi.it
URN
etd-02212008-223613
Titolo
Metodi di fusione di sequenze video nell’infrarosso per il miglioramento della capacità di visione
Settore scientifico disciplinare
ING-INF/03
Corso di studi
TELERILEVAMENTO
Relatori
Relatore Prof. Corsini, Giovanni
Relatore Prof. Diani, Marco
Parole chiave
  • fusione delle immagini
  • fusione video
Data inizio appello
13/03/2008
Consultabilità
Non consultabile
Data di rilascio
13/03/2048
Riassunto
Image fusion methods combine the visual information contained into different source images in a single composite one to enhance human perception and interpretation capabilities.
Such a tool is of paramount importance in many applications, where it is crucial to extend the human vision to improve the operator performances.
Interesting applications can be easily found in many fields as for examples in the field of vehicles driving and aircraft piloting or in the field of real time monitoring by platform.
The use of different infrared sensors can be very useful for the perception of the scenario in bad visibility conditions (night, rain and fog) and, consequently, can aid an operator in his work. Multispectral infrared image sources can be fused to synthesize the salient information collected in the different channels enabling a better scene interpretation and improving the situational awareness.
As a matter of fact, Image Fusion techniques are expected to achieve several objectives which can be summarized as follows:
a) integration of images from different sensors has to produce information that cannot be obtained by viewing the sensor outputs separately and consecutively; b) the information extracted from the input images must be salient with respect to the specific application and must improve the image semantic interpretation. Obviously, fusion methods should not discard any salient information from each source; c) an essential problem in merging images is ‘pattern conservation’: important details of the component images must be preserved in the resulting
composite image. Therefore the incomplete representation of objects in one image may be integrated by information from the other one (Complementary information); d) the fusion should not introduce any artefacts which can distract or mislead a human observer; e) the merging operation shall harmonise the disparity between the images coming from the input sensors. For example, the sensor output images could not be equally reliable. Such disparities have to be taken into account when fusing the content of the information from such sources; g) the fusion must be reliable, robust and has to have the capability to tolerate disturbances and errors (noise and misregistrations); h) common but contrast reversal information must be treated in an appropriate way: there could be various objects and regions that occur in both images but with opposite contrast. Therefore the direct approach of adding and averaging the source images in this case is not satisfactory.
The fusion process can be performed at different levels of abstraction, i.e. at different levels of information representation: pixel, feature and symbol level ‎[18],‎[12].
Image fusion at pixel level is the fusion at the lowest processing level: each pixel in the fused image is determined from a set of pixels from each input source.
Fusion at feature level requires the extraction of various features contained in the input sources. Generally, typical features are edges, shape, texture, etc ‎[7].
Fusion at symbol level is the combination of information at the highest level of abstraction: Individual decisions are taken from each sensor and combined in order to get the final decision ‎[18].
In this work we consider multi-resolution based fusion methodologies that generate the final image pixel by pixel, leaving the interpretation of the objects’ shapes to the human operator. With respect to the other levels of abstraction, pixel-level based approaches reduce possible artefacts and allow the human observer to perceive, and hence to easily detect and recognize, the objects in the scene.
Multiresolution-based fusion techniques (MRBFTs) are the most widely studied in the literature and their effectiveness has been demonstrated in a number of applications (‎[9], ‎[37], ‎[47]). MRBFTs attempt to reproduce the ability of the human vision system to perceive the details i‎[25]n a scene, separating them at the different scales.
Examples of multiresolution decompositions are the pyramid and the wavelet transforms. A pyramid transform is a collection of band-pass copies of the original image reduced at regular steps. Many pyramid-based decomposition schemes have been proposed over the years [25] .
The basic strategy of a MRBFT is the following: The source images are decomposed through a multiresolution transform, then specific pixel merging rules are applied at each level of the decomposition to generate the multiresolution representation of the merged image. The final fused image is then obtained by performing an inverse multiresolution transform. In this work many methodologies of multiresolution transformation, and their correspondent merging rules are analysed.
A fundamental issue of image fusion techniques is the process for evaluating the performances of a fusion scheme. In fact, the improvement depends on the particular scenario, the used sensors, the lighting conditions and, obviously, on the capabilities of the human observer. Then, it is very difficult to define general procedures to compare fusion results.
Traditionally, the quality of video sequences is evaluated subjectively by an appropriate number of human evaluators. The main disadvantages of this method are: it requires an appropriate number of evaluators, thus it is time consuming and expensive; it cannot be done in real time.
As a result, a considerable research effort has been addressed to the development of automatic objective methods for video quality measurement. Performance measures are essential for various reasons: to ascertain the possible benefits of fusion; to compare results obtained with different algorithms; to obtain an optimal setting of parameters for tuning a specific fusion algorithm. A good quality index should extract all the important information from a perceptive point of view from the input images and measure the ability of the fusion process in transferring with the highest accuracy (that is minimising the number of artefacts or the amount of distortions) this information into the final image. For this purpose, we propose a new method to evaluate and compare the performance of different fusion strategies which integrates the scores based on two well established metrics proposed in ‎[29] and ‎[54]. The procedure is illustrated with reference to a set of experimental data. The experimental analysis shows that the use of figures of merit represents a low cost and effective tool in selecting the appropriate fusion method. In particular the analysis, conducted on a set of experimental data acquired by a prototype system in typical automotive scenarios, shows that the method based on a Laplacian pyramid decomposition and an absolute maximum selection rule yields the best results in the scenarios considered.
File