Engineering Media Technology

Remote-Sensing Image Classification

Description

This cluster of papers focuses on the advances in hyperspectral image analysis, remote sensing, and classification. It covers topics such as deep learning, change detection, spectral unmixing, feature extraction, and object-based analysis for remote sensing applications.

Keywords

Hyperspectral; Image Analysis; Remote Sensing; Classification; Deep Learning; Change Detection; Spectral Unmixing; Feature Extraction; Object-Based Analysis; Support Vector Machines

Recently, convolutional neural networks have demonstrated excellent performance on various visual tasks, including the classification of common two-dimensional images. In this paper, deep convolutional neural networks are employed to classify … Recently, convolutional neural networks have demonstrated excellent performance on various visual tasks, including the classification of common two-dimensional images. In this paper, deep convolutional neural networks are employed to classify hyperspectral images directly in spectral domain. More specifically, the architecture of the proposed classifier contains five layers with weights which are the input layer, the convolutional layer, the max pooling layer, the full connection layer, and the output layer. These five layers are implemented on each spectral signature to discriminate against others. Experimental results based on several hyperspectral image data sets demonstrate that the proposed method can achieve better classification performance than some traditional methods, such as support vector machines and the conventional deep learning-based methods.
Remote sensing imagery needs to be converted into tangible information which can be utilised in conjunction with other data sets, often within widely used Geographic Information Systems (GIS). As long … Remote sensing imagery needs to be converted into tangible information which can be utilised in conjunction with other data sets, often within widely used Geographic Information Systems (GIS). As long as pixel sizes remained typically coarser than, or at the best, similar in size to the objects of interest, emphasis was placed on per-pixel analysis, or even sub-pixel analysis for this conversion, but with increasing spatial resolutions alternative paths have been followed, aimed at deriving objects that are made up of several pixels. This paper gives an overview of the development of object based methods, which aim to delineate readily usable objects from imagery while at the same time combining image processing and GIS functionalities in order to utilize spectral and contextual information in an integrative way. The most common approach used for building objects is image segmentation, which dates back to the 1970s. Around the year 2000 GIS and image processing started to grow together rapidly through object based image analysis (OBIA - or GEOBIA for geospatial object based image analysis). In contrast to typical Landsat resolutions, high resolution images support several scales within their images. Through a comprehensive literature review several thousand abstracts have been screened, and more than 820 OBIA-related articles comprising 145 journal papers, 84 book chapters and nearly 600 conference papers, are analysed in detail. It becomes evident that the first years of the OBIA/GEOBIA developments were characterised by the dominance of 'grey' literature, but that the number of peer-reviewed journal articles has increased sharply over the last four to five years. The pixel paradigm is beginning to show cracks and the OBIA methods are making considerable progress towards a spatially explicit information extraction workflow, such as is required for spatial planning as well as for many monitoring programmes.
Abstract Two fuzzy versions of the k-means optimal, least squared error partitioning problem are formulated for finite subsets X of a general inner product space. In both cases, the extremizing … Abstract Two fuzzy versions of the k-means optimal, least squared error partitioning problem are formulated for finite subsets X of a general inner product space. In both cases, the extremizing solutions are shown to be fixed points of a certain operator T on the class of fuzzy, k-partitions of X, and simple iteration of T provides an algorithm which has the descent property relative to the least squared error criterion function. In the first case, the range of T consists largely of ordinary (i.e. non-fuzzy) partitions of X and the associated iteration scheme is essentially the well known ISODATA process of Ball and Hall. However, in the second case, the range of T consists mainly of fuzzy partitions and the associated algorithm is new; when X consists of k compact well separated (CWS) clusters, Xi , this algorithm generates a limiting partition with membership functions which closely approximate the characteristic functions of the clusters Xi . However, when X is not the union of k CWS clusters, the limiting partition is truly fuzzy in the sense that the values of its component membership functions differ substantially from 0 or 1 over certain regions of X. Thus, unlike ISODATA, the “fuzzy” algorithm signals the presence or absence of CWS clusters in X. Furthermore, the fuzzy algorithm seems significantly less prone to the “cluster-splitting” tendency of ISODATA and may also be less easily diverted to uninteresting locally optimal partitions. Finally, for data sets X consisting of dense CWS clusters embedded in a diffuse background of strays, the structure of X is accurately reflected in the limiting partition generated by the fuzzy algorithm. Mathematical arguments and numerical results are offered in support of the foregoing assertions.
Abstract Two separation indices are considered for partitions P = {X1, …, Xk} of a finite data set X in a general inner product space. Both indices increase as the … Abstract Two separation indices are considered for partitions P = {X1, …, Xk} of a finite data set X in a general inner product space. Both indices increase as the pairwise distances between the subsets Xi become large compared to the diameters of Xi Maximally separated partitions p' are defined and it is shown that as the indices of p' increase without bound, the characteristic functions of Xi' in P' are approximated more and more closely by the membership functions in fuzzy partitions which minimize certain fuzzy extensions of the k-means squared error criterion function.
Classification is one of the most popular topics in hyperspectral remote sensing. In the last two decades, a huge number of methods were proposed to deal with the hyperspectral data … Classification is one of the most popular topics in hyperspectral remote sensing. In the last two decades, a huge number of methods were proposed to deal with the hyperspectral data classification problem. However, most of them do not hierarchically extract deep features. In this paper, the concept of deep learning is introduced into hyperspectral data classification for the first time. First, we verify the eligibility of stacked autoencoders by following classical spectral information-based classification. Second, a new way of classifying with spatial-dominated information is proposed. We then propose a novel deep learning framework to merge the two features, from which we can get the highest classification accuracy. The framework is a hybrid of principle component analysis (PCA), deep learning architecture, and logistic regression. Specifically, as a deep learning architecture, stacked autoencoders are aimed to get useful high-level features. Experimental results with widely-used hyperspectral data indicate that classifiers built in this deep learning-based framework provide competitive performance. In addition, the proposed joint spectral-spatial deep neural network opens a new window for future research, showcasing the deep learning-based methods' huge potential for accurate hyperspectral data classification.
Abstract A variety of procedures for change detection based on comparison of multitemporal digital remote sensing data have been developed. An evaluation of results indicates that various procedures of change … Abstract A variety of procedures for change detection based on comparison of multitemporal digital remote sensing data have been developed. An evaluation of results indicates that various procedures of change detection produce different maps of change even in the same environment.
The support vector machine (SVM) is a group of theoretically superior machine learning algorithms. It was found competitive with the best available machine learning algorithms in classifying high-dimensional data sets. … The support vector machine (SVM) is a group of theoretically superior machine learning algorithms. It was found competitive with the best available machine learning algorithms in classifying high-dimensional data sets. This paper gives an introduction to the theoretical development of the SVM and an experimental evaluation of its accuracy, stability and training speed in deriving land cover classifications from satellite images. The SVM was compared to three other popular classifiers, including the maximum likelihood classifier (MLC), neural network classifiers (NNC) and decision tree classifiers (DTC). The impacts of kernel configuration on the performance of the SVM and of the selection of training data and input variables on the four classifiers were also evaluated in this experiment.
Hyperspectral remote sensing technology has advanced significantly in the past two decades. Current sensors onboard airborne and spaceborne platforms cover large areas of the Earth surface with unprecedented spectral, spatial, … Hyperspectral remote sensing technology has advanced significantly in the past two decades. Current sensors onboard airborne and spaceborne platforms cover large areas of the Earth surface with unprecedented spectral, spatial, and temporal resolutions. These characteristics enable a myriad of applications requiring fine identification of materials or estimation of physical parameters. Very often, these applications rely on sophisticated and complex data analysis methods. The sources of difficulties are, namely, the high dimensionality and size of the hyperspectral data, the spectral mixing (linear and nonlinear), and the degradation mechanisms associated to the measurement process such as noise and atmospheric effects. This paper presents a tutorial/overview cross section of some relevant hyperspectral data analysis methods and algorithms, organized in six main topics: data fusion, unmixing, classification, target detection, physical parameter retrieval, and fast computing. In all topics, we describe the state-of-the-art, provide illustrative examples, and point to future challenges and research directions.
The amount of scientific literature on (Geographic) Object-based Image Analysis - GEOBIA has been and still is sharply increasing. These approaches to analysing imagery have antecedents in earlier research on … The amount of scientific literature on (Geographic) Object-based Image Analysis - GEOBIA has been and still is sharply increasing. These approaches to analysing imagery have antecedents in earlier research on image segmentation and use GIS-like spatial analysis within classification and feature extraction approaches. This article investigates these development and its implications and asks whether or not this is a new paradigm in remote sensing and Geographic Information Science (GIScience). We first discuss several limitations of prevailing per-pixel methods when applied to high resolution images. Then we explore the paradigm concept developed by Kuhn (1962) and discuss whether GEOBIA can be regarded as a paradigm according to this definition. We crystallize core concepts of GEOBIA, including the role of objects, of ontologies and the multiplicity of scales and we discuss how these conceptual developments support important methods in remote sensing such as change detection and accuracy assessment. The ramifications of the different theoretical foundations between the
This paper presents the framework of kernel-based methods in the context of hyperspectral image classification, illustrating from a general viewpoint the main characteristics of different kernel-based approaches and analyzing their … This paper presents the framework of kernel-based methods in the context of hyperspectral image classification, illustrating from a general viewpoint the main characteristics of different kernel-based approaches and analyzing their properties in the hyperspectral domain. In particular, we assess performance of regularized radial basis function neural networks (Reg-RBFNN), standard support vector machines (SVMs), kernel Fisher discriminant (KFD) analysis, and regularized AdaBoost (Reg-AB). The novelty of this work consists in: 1) introducing Reg-RBFNN and Reg-AB for hyperspectral image classification; 2) comparing kernel-based methods by taking into account the peculiarities of hyperspectral images; and 3) clarifying their theoretical relationships. To these purposes, we focus on the accuracy of methods when working in noisy environments, high input dimension, and limited training sets. In addition, some other important issues are discussed, such as the sparsity of the solutions, the computational burden, and the capability of the methods to provide outputs that can be directly interpreted as probabilities.
Most applications of hyperspectral imagery require processing techniques which achieve two fundamental goals: 1) detect and classify the constituent materials for each pixel in the scene; 2) reduce the data … Most applications of hyperspectral imagery require processing techniques which achieve two fundamental goals: 1) detect and classify the constituent materials for each pixel in the scene; 2) reduce the data volume/dimensionality, without loss of critical information, so that it can be processed efficiently and assimilated by a human analyst. The authors describe a technique which simultaneously reduces the data dimensionality, suppresses undesired or interfering spectral signatures, and detects the presence of a spectral signature of interest. The basic concept is to project each pixel vector onto a subspace which is orthogonal to the undesired signatures. This operation is an optimal interference suppression process in the least squares sense. Once the interfering signatures have been nulled, projecting the residual onto the signature of interest maximizes the signal-to-noise ratio and results in a single component image that represents a classification for the signature of interest. The orthogonal subspace projection (OSP) operator can be extended to k-signatures of interest, thus reducing the dimensionality of k and classifying the hyperspectral image simultaneously. The approach is applicable to both spectrally pure as well as mixed pixels.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>
Computer vision applications have come to rely increasingly on superpixels in recent years, but it is not always clear what constitutes a good superpixel algorithm. In an effort to understand … Computer vision applications have come to rely increasingly on superpixels in recent years, but it is not always clear what constitutes a good superpixel algorithm. In an effort to understand the benefits and drawbacks of existing methods, we empirically compare five state-of-the-art superpixel algorithms for their ability to adhere to image boundaries, speed, memory efficiency, and their impact on segmentation performance. We then introduce a new superpixel algorithm, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels. Despite its simplicity, SLIC adheres to boundaries as well as or better than previous methods. At the same time, it is faster and more memory efficient, improves segmentation performance, and is straightforward to extend to supervoxel generation.
Multi-resolution image features may be approximated via extrapolation from nearby scales, rather than being computed explicitly. This fundamental insight allows us to design object detection algorithms that are as accurate, … Multi-resolution image features may be approximated via extrapolation from nearby scales, rather than being computed explicitly. This fundamental insight allows us to design object detection algorithms that are as accurate, and considerably faster, than the state-of-the-art. The computational bottleneck of many modern detectors is the computation of features at every scale of a finely-sampled image pyramid. Our key insight is that one may compute finely sampled feature pyramids at a fraction of the cost, without sacrificing performance: for a broad family of features we find that features computed at octave-spaced scale intervals are sufficient to approximate features on a finely-sampled pyramid. Extrapolation is inexpensive as compared to direct feature computation. As a result, our approximation yields considerable speedups with negligible loss in detection accuracy. We modify three diverse visual recognition systems to use fast feature pyramids and show results on both pedestrian detection (measured on the Caltech, INRIA, TUD-Brussels and ETH data sets) and general object detection (measured on the PASCAL VOC). The approach is general and is widely applicable to vision algorithms requiring fine-grained multi-scale analysis. Our approximation is valid for images with broad spectra (most natural images) and fails for images with narrow band-pass spectra (e.g., periodic textures).
Linear spectral mixture analysis (LSMA) is a widely used technique in remote sensing to estimate abundance fractions of materials present in an image pixel. In order for an LSMA-based estimator … Linear spectral mixture analysis (LSMA) is a widely used technique in remote sensing to estimate abundance fractions of materials present in an image pixel. In order for an LSMA-based estimator to produce accurate amounts of material abundance, it generally requires two constraints imposed on the linear mixture model used in LSMA, which are the abundance sum-to-one constraint and the abundance nonnegativity constraint. The first constraint requires the sum of the abundance fractions of materials present in an image pixel to be one and the second imposes a constraint that these abundance fractions be nonnegative. While the first constraint is easy to deal with, the second constraint is difficult to implement since it results in a set of inequalities and can only be solved by numerical methods. Consequently, most LSMA-based methods are unconstrained and produce solutions that do not necessarily reflect the true abundance fractions of materials. In this case, they can only be used for the purposes of material detection, discrimination, and classification, but not for material quantification. The authors present a fully constrained least squares (FCLS) linear spectral mixture analysis method for material quantification. Since no closed form can be derived for this method, an efficient algorithm is developed to yield optimal solutions. In order to further apply the designed algorithm to unknown image scenes, an unsupervised least squares error (LSE)-based method is also proposed to extend the FCLS method in an unsupervised manner.
This paper addresses the problem of the classification of hyperspectral remote sensing images by support vector machines (SVMs). First, we propose a theoretical discussion and experimental analysis aimed at understanding … This paper addresses the problem of the classification of hyperspectral remote sensing images by support vector machines (SVMs). First, we propose a theoretical discussion and experimental analysis aimed at understanding and assessing the potentialities of SVM classifiers in hyperdimensional feature spaces. Then, we assess the effectiveness of SVMs with respect to conventional feature-reduction-based approaches and their performances in hypersubspaces of various dimensionalities. To sustain such an analysis, the performances of SVMs are compared with those of two other nonparametric classifiers (i.e., radial basis function neural networks and the K-nearest neighbor classifier). Finally, we study the potentially critical issue of applying binary SVMs to multiclass problems in hyperspectral data. In particular, four different multiclass strategies are analyzed and compared: the one-against-all, the one-against-one, and two hierarchical tree-based strategies. Different performance indicators have been used to support our experimental studies in a detailed and accurate way, i.e., the classification accuracy, the computational time, the stability to parameter setting, and the complexity of the multiclass architecture. The results obtained on a real Airborne Visible/Infrared Imaging Spectroradiometer hyperspectral dataset allow to conclude that, whatever the multiclass strategy adopted, SVMs are a valid and effective alternative to conventional pattern recognition approaches (feature-reduction procedures combined with a classification method) for the classification of hyperspectral remote sensing data.
Techniques based on multi-temporal, multi-spectral, satellite-sensor-acquired data have demonstrated potential as a means to detect, identify, map and monitor ecosystem changes, irrespective of their causal agents. This review paper, which … Techniques based on multi-temporal, multi-spectral, satellite-sensor-acquired data have demonstrated potential as a means to detect, identify, map and monitor ecosystem changes, irrespective of their causal agents. This review paper, which summarizes the methods and the results of digital change detection in the optical/infrared domain, has as its primary objective a synthesis of the state of the art today. It approaches digital change detection from three angles. First, the different perspectives from which the variability in ecosystems and the change events have been dealt with are summarized. Change detection between pairs of images (bi-temporal) as well as between time profiles of imagery derived indicators (temporal trajectories), and, where relevant, the appropriate choices for digital imagery acquisition timing and change interval length definition, are discussed. Second, pre-processing routines either to establish a more direct linkage between remote sensing data and biophysical phenomena, or to temporally mosaic imagery and extract time profiles, are reviewed. Third, the actual change detection methods themselves are categorized in an analytical framework and critically evaluated. Ultimately, the paper highlights how some of these methodological aspects are being fine-tuned as this review is being written, and we summarize the new developments that can be expected in the near future. The review highlights the high complementarity between different change detection methods.
A large number of algorithms have been proposed for feature subset selection. Our experimental results show that the sequential forward floating selection algorithm, proposed by Pudil et al. (1994), dominates … A large number of algorithms have been proposed for feature subset selection. Our experimental results show that the sequential forward floating selection algorithm, proposed by Pudil et al. (1994), dominates the other algorithms tested. We study the problem of choosing an optimal feature set for land use classification based on SAR satellite images using four different texture models. Pooling features derived from different texture models, followed by a feature selection results in a substantial improvement in the classification accuracy. We also illustrate the dangers of using feature selection in small sample size situations.
Given a set of mixed spectral (multispectral or hyperspectral) vectors, linear spectral mixture analysis, or linear unmixing, aims at estimating the number of reference substances, also called endmembers, their spectral … Given a set of mixed spectral (multispectral or hyperspectral) vectors, linear spectral mixture analysis, or linear unmixing, aims at estimating the number of reference substances, also called endmembers, their spectral signatures, and their abundance fractions. This paper presents a new method for unsupervised endmember extraction from hyperspectral data, termed vertex component analysis (VCA). The algorithm exploits two facts: (1) the endmembers are the vertices of a simplex and (2) the affine transformation of a simplex is also a simplex. In a series of experiments using simulated and real data, the VCA algorithm competes with state-of-the-art methods, with a computational complexity between one and two orders of magnitude lower than the best available method.
Imaging spectrometers measure electromagnetic energy scattered in their instantaneous field view in hundreds or thousands of spectral channels with higher spectral resolution than multispectral cameras. Imaging spectrometers are therefore often … Imaging spectrometers measure electromagnetic energy scattered in their instantaneous field view in hundreds or thousands of spectral channels with higher spectral resolution than multispectral cameras. Imaging spectrometers are therefore often referred to as hyperspectral cameras (HSCs). Higher spectral resolution enables material identification via spectroscopic analysis, which facilitates countless applications that require identifying materials in scenarios unsuitable for classical spectroscopic analysis. Due to low spatial resolution of HSCs, microscopic material mixing, and multiple scattering, spectra measured by HSCs are mixtures of spectra of materials in a scene. Thus, accurate estimation requires unmixing. Pixels are assumed to be mixtures of a few materials, called endmembers. Unmixing involves estimating all or some of: the number of endmembers, their spectral signatures, and their abundances at each pixel. Unmixing is a challenging, ill-posed inverse problem because of model inaccuracies, observation noise, environmental conditions, endmember variability, and data set size. Researchers have devised and investigated many models searching for robust, stable, tractable, and accurate unmixing algorithms. This paper presents an overview of unmixing methods from the time of Keshava and Mustard's unmixing tutorial to the present. Mixing models are first discussed. Signal-subspace, geometrical, statistical, sparsity-based, and spatial-contextual unmixing algorithms are described. Mathematical problems and potential solutions are described. Algorithm characteristics are illustrated experimentally.
Timely and accurate change detection of Earth's surface features is extremely important for understanding relationships and interactions between human and natural phenomena in order to promote better decision making. Remote … Timely and accurate change detection of Earth's surface features is extremely important for understanding relationships and interactions between human and natural phenomena in order to promote better decision making. Remote sensing data are primary sources extensively used for change detection in recent decades. Many change detection techniques have been developed. This paper summarizes and reviews these techniques. Previous literature has shown that image differencing, principal component analysis and post-classification comparison are the most common methods used for change detection. In recent years, spectral mixture analysis, artificial neural networks and integration of geographical information system and remote sensing data have become important techniques for change detection applications. Different change detection algorithms have their own merits and no single approach is optimal and applicable to all cases. In practice, different algorithms are often compared to find the best change detection results for a specific application. Research of change detection techniques is still an active topic and new techniques are needed to effectively use the increasingly diverse and complex remotely sensed data available or projected to be soon available from satellite and airborne sensors. This paper is a comprehensive exploration of all the major change detection approaches implemented as found in the literature. Abbreviations used in this paper 6S second simulation of the satellite signal in the solar spectrum ANN artificial neural networks ASTER Advanced Spaceborne Thermal Emission and Reflection Radiometer AVHRR Advanced Very High Resolution Radiometer AVIRIS Airborne Visible/Infrared Imaging Spectrometer CVA change vector analysis EM expectation–maximization algorithm ERS-1 Earth Resource Satellite-1 ETM+ Enhanced Thematic Mapper Plus, Landsat 7 satellite image GIS Geographical Information System GS Gramm–Schmidt transformation J-M distance Jeffries–Matusita distance KT Kauth–Thomas transformation or tasselled cap transformation LSMA linear spectral mixture analysis LULC land use and land cover MODIS Moderate Resolution Imaging Spectroradiometer MSAVI Modified Soil Adjusted Vegetation Index MSS Landsat Multi-Spectral Scanner image NDMI Normalized Difference Moisture Index NDVI Normalized Difference Vegetation Index NOAA National Oceanic and Atmospheric Administration PCA principal component analysis RGB red, green and blue colour composite RTB ratio of tree biomass to total aboveground biomass SAR synthetic aperture radar SAVI Soil Adjusted Vegetation Index SPOT HRV Satellite Probatoire d'Observation de la Terre (SPOT) high resolution visible image TM Thematic Mapper VI Vegetation Index
Image classification is a complex process that may be affected by many factors. This paper examines current practices, problems, and prospects of image classification. The emphasis is placed on the … Image classification is a complex process that may be affected by many factors. This paper examines current practices, problems, and prospects of image classification. The emphasis is placed on the summarization of major advanced classification approaches and the techniques used for improving classification accuracy. In addition, some important issues affecting classification performance are discussed. This literature review suggests that designing a suitable image‐processing procedure is a prerequisite for a successful classification of remotely sensed data into a thematic map. Effective use of multiple features of remotely sensed data and the selection of a suitable classification method are especially significant for improving classification accuracy. Non‐parametric classifiers such as neural network, decision tree classifier, and knowledge‐based classification have increasingly become important approaches for multisource data classification. Integration of remote sensing, geographical information systems (GIS), and expert system emerges as a new research frontier. More research, however, is needed to identify and reduce uncertainties in the image‐processing chain to improve classification accuracy.
Detecting regions of change in multiple images of the same scene taken at different times is of widespread interest due to a large number of applications in diverse disciplines, including … Detecting regions of change in multiple images of the same scene taken at different times is of widespread interest due to a large number of applications in diverse disciplines, including remote sensing, surveillance, medical diagnosis and treatment, civil infrastructure, and underwater sensing. This paper presents a systematic survey of the common processing steps and core decision rules in modern change detection algorithms, including significance and hypothesis testing, predictive models, the shading model, and background modeling. We also discuss important preprocessing methods, approaches to enforcing the consistency of the change mask, and principles for evaluating and comparing the performance of change detection algorithms. It is hoped that our classification of algorithms into a relatively small number of categories will provide useful guidance to the algorithm designer.
Spectral unmixing using hyperspectral data represents a significant step in the evolution of remote decompositional analysis that began with multispectral sensing. It is a consequence of collecting data in greater … Spectral unmixing using hyperspectral data represents a significant step in the evolution of remote decompositional analysis that began with multispectral sensing. It is a consequence of collecting data in greater and greater quantities and the desire to extract more detailed information about the material composition of surfaces. Linear mixing is the key assumption that has permitted well-known algorithms to be adapted to the unmixing problem. In fact, the resemblance of the linear mixing model to system models in other areas has permitted a significant legacy of algorithms from a wide range of applications to be adapted to unmixing. However, it is still unclear whether the assumption of linearity is sufficient to model the mixing process in every application of interest. It is clear, however, that the applicability of models and techniques is highly dependent on the variety of circumstances and factors that give rise to mixed pixels. The outputs of spectral unmixing, endmember, and abundance estimates are important for identifying the material composition of mixtures.
Deep-learning (DL) algorithms, which learn the representative and discriminative features in a hierarchical manner from the data, have recently become a hotspot in the machine-learning area and have been introduced … Deep-learning (DL) algorithms, which learn the representative and discriminative features in a hierarchical manner from the data, have recently become a hotspot in the machine-learning area and have been introduced into the geoscience and remote sensing (RS) community for RS big data analysis. Considering the low-level features (e.g., spectral and texture) as the bottom level, the output feature representation from the top level of the network can be directly fed into a subsequent classifier for pixel-based classification. As a matter of fact, by carefully addressing the practical demands in RS applications and designing the input?output levels of the whole network, we have found that DL is actually everywhere in RS data analysis: from the traditional topics of image preprocessing, pixel-based classification, and target recognition, to the recent challenging tasks of high-level semantic feature extraction and RS scene understanding.In this technical tutorial, a general framework of DL for RS data is provided, and the state-of-the-art DL methods in RS are regarded as special cases of input-output data combined with various deep networks and tuning tricks. Although extensive experimental results confirm the excellent performance of the DL-based algorithms in RS big data analysis, even more exciting prospects can be expected for DL in RS. Key bottlenecks and potential directions are also indicated in this article, guiding further research into DL for RS data.
Due to the advantages of deep learning, in this paper, a regularized deep feature extraction (FE) method is presented for hyperspectral image (HSI) classification using a convolutional neural network (CNN). … Due to the advantages of deep learning, in this paper, a regularized deep feature extraction (FE) method is presented for hyperspectral image (HSI) classification using a convolutional neural network (CNN). The proposed approach employs several convolutional and pooling layers to extract deep features from HSIs, which are nonlinear, discriminant, and invariant. These features are useful for image classification and target detection. Furthermore, in order to address the common issue of imbalance between high dimensionality and limited availability of training samples for the classification of HSI, a few strategies such as L2 regularization and dropout are investigated to avoid overfitting in class data modeling. More importantly, we propose a 3-D CNN-based FE model with combined regularization to extract effective spectral-spatial features of hyperspectral imagery. Finally, in order to further improve the performance, a virtual sample enhanced method is proposed. The proposed approaches are carried out on three widely used hyperspectral data sets: Indian Pines, University of Pavia, and Kennedy Space Center. The obtained results reveal that the proposed models with sparse constraints provide competitive results to state-of-the-art methods. In addition, the proposed deep FE opens a new window for further research.
Object detection in very high resolution optical remote sensing images is a fundamental problem faced for remote sensing image analysis. Due to the advances of powerful feature representations, machine-learning-based object … Object detection in very high resolution optical remote sensing images is a fundamental problem faced for remote sensing image analysis. Due to the advances of powerful feature representations, machine-learning-based object detection is receiving increasing attention. Although numerous feature representations exist, most of them are handcrafted or shallow-learning-based features. As the object detection task becomes more challenging, their description capability becomes limited or even impoverished. More recently, deep learning algorithms, especially convolutional neural networks (CNNs), have shown their much stronger feature representation power in computer vision. Despite the progress made in nature scene images, it is problematic to directly use the CNN feature for object detection in optical remote sensing images because it is difficult to effectively deal with the problem of object rotation variations. To address this problem, this paper proposes a novel and effective approach to learn a rotation-invariant CNN (RICNN) model for advancing the performance of object detection, which is achieved by introducing and learning a new rotation-invariant layer on the basis of the existing CNN architectures. However, different from the training of traditional CNN models that only optimizes the multinomial logistic regression objective, our RICNN model is trained by optimizing a new objective function via imposing a regularization constraint, which explicitly enforces the feature representations of the training samples before and after rotating to be mapped close to each other, hence achieving rotation invariance. To facilitate training, we first train the rotation-invariant layer and then domain-specifically fine-tune the whole RICNN network to further boost the performance. Comprehensive evaluations on a publicly available ten-class object detection data set demonstrate the effectiveness of the proposed method.
In this paper, we designed an end-to-end spectral-spatial residual network (SSRN) that takes raw 3-D cubes as input data without feature engineering for hyperspectral image classification. In this network, the … In this paper, we designed an end-to-end spectral-spatial residual network (SSRN) that takes raw 3-D cubes as input data without feature engineering for hyperspectral image classification. In this network, the spectral and spatial residual blocks consecutively learn discriminative features from abundant spectral signatures and spatial contexts in hyperspectral imagery (HSI). The proposed SSRN is a supervised deep learning framework that alleviates the declining-accuracy phenomenon of other deep learning models. Specifically, the residual blocks connect every other 3-D convolutional layer through identity mapping, which facilitates the backpropagation of gradients. Furthermore, we impose batch normalization on every convolutional layer to regularize the learning process and improve the classification performance of trained models. Quantitative and qualitative results demonstrate that the SSRN achieved the state-of-the-art HSI classification accuracy in agricultural, rural-urban, and urban data sets: Indian Pines, Kennedy Space Center, and University of Pavia.
Standing at the paradigm shift towards data-intensive science, machine learning techniques are becoming increasingly important. In particular, as a major breakthrough in the field, deep learning has proven as an … Standing at the paradigm shift towards data-intensive science, machine learning techniques are becoming increasingly important. In particular, as a major breakthrough in the field, deep learning has proven as an extremely powerful tool in many fields. Shall we embrace deep learning as the key to all? Or, should we resist a 'black-box' solution? There are controversial opinions in the remote sensing community. In this article, we analyze the challenges of using deep learning for remote sensing data analysis, review the recent advances, and provide resources to make deep learning in remote sensing ridiculously simple to start with. More importantly, we advocate remote sensing scientists to bring their expertise into deep learning, and use it as an implicit general model to tackle unprecedented large-scale influential challenges, such as climate change and urbanization.
Machine learning offers the potential for effective and efficient classification of remotely sensed imagery. The strengths of machine learning include the capacity to handle data of high dimensionality and to … Machine learning offers the potential for effective and efficient classification of remotely sensed imagery. The strengths of machine learning include the capacity to handle data of high dimensionality and to map classes with very complex characteristics. Nevertheless, implementing a machine-learning classification is not straightforward, and the literature provides conflicting advice regarding many key issues. This article therefore provides an overview of machine learning from an applied perspective. We focus on the relatively mature methods of support vector machines, single decision trees (DTs), Random Forests, boosted DTs, artificial neural networks, and k-nearest neighbours (k-NN). Issues considered include the choice of algorithm, training data requirements, user-defined parameter selection and optimization, feature space impacts and reduction, and computational costs. We illustrate these issues through applying machine-learning classification to two publically available remotely sensed data sets.
Deep learning (DL) algorithms have seen a massive rise in popularity for remote-sensing image analysis over the past few years. In this study, the major DL concepts pertinent to remote-sensing … Deep learning (DL) algorithms have seen a massive rise in popularity for remote-sensing image analysis over the past few years. In this study, the major DL concepts pertinent to remote-sensing are introduced, and more than 200 publications in this field, most of which were published during the last two years, are reviewed and analyzed. Initially, a meta-analysis was conducted to analyze the status of remote sensing DL studies in terms of the study targets, DL model(s) used, image spatial resolution(s), type of study area, and level of classification accuracy achieved. Subsequently, a detailed review is conducted to describe/discuss how DL has been applied for remote sensing image analysis tasks including image fusion, image registration, scene classification, object detection, land use and land cover (LULC) classification, segmentation, and object-based image analysis (OBIA). This review covers nearly every application and technology in the field of remote sensing, ranging from preprocessing to mapping. Finally, a conclusion regarding the current state-of-the art methods, a critical conclusion on open challenges, and directions for future research are presented.
Spectral unmixing using hyperspectral data represents a significant step in the evolution of remote decompositional analysis that began with multispectral sensing. It is a consequence of collecting data in greater … Spectral unmixing using hyperspectral data represents a significant step in the evolution of remote decompositional analysis that began with multispectral sensing. It is a consequence of collecting data in greater and greater quantities and the desire to extract more detailed information about the material composition of surfaces. Linear mixing is the key assumption that has permitted well-known algorithms to be adapted to the unmixing problem. In fact, the resemblance of the linear mixing model to system models in other areas has permitted a significant legacy of algorithms from a wide range of applications to be adapted to unmixing. However, it is still unclear whether the assumption of linearity is sufficient to model the mixing process in every application of interest. It is clear, however, that the applicability of models and techniques is highly dependent on the variety of circumstances and factors that give rise to mixed pixels. The outputs of spectral unmixing, endmember, and abundance estimates are important for identifying the material composition of mixtures.
Marine debris is persistent solid stuff in the water. Oceans include several varieties of organic marine debris, but massive levels of man-made marine trash threaten their biological equilibrium. Manually scanning … Marine debris is persistent solid stuff in the water. Oceans include several varieties of organic marine debris, but massive levels of man-made marine trash threaten their biological equilibrium. Manually scanning the ocean for garbage is time-consuming and inefficient, making it uneconomical. Deep learning, which is more efficient than manual methods, is used to detect marine debris in satellite imagery in our work. Deep learning algorithms have been successful in semantic segmentation, however marine debris detection using satellite imagery has been underexplored. The lack of comprehensive marine debris datasets until recently and the complexity of multispectral satellite photos are to blame. Our segmentation method using the UNet architecture and a ResNext50 backbone exceeds the existing state of the art on the Marine Debris Archive Dataset (MARIDA), a dataset of 11 band sentinel 2 Satellite image patches. The hybrid solution combines ResNext50's increased feature extraction with UNet's global and local context preservation, which is crucial in satellite photos of floating bodies due to marine debris' movement pattern. We achieved benchmark mean pixel accuracy, IoU, and F1 scores. We achieved an 88% recall, a 10% improvement over the state of the art, in categorizing marine trash pixels in photos. This work attempts to advance deep learning algorithms for remote sensing and move closer to cleaner oceans.
S Liu , Chunli Zhu , Lintao Peng +3 more | International Journal of Applied Earth Observation and Geoinformation
Object detection for sensing images is one of the promising research directions in computer vision. Applications for object detection from remote sensing images play an important role in analyzing aerial … Object detection for sensing images is one of the promising research directions in computer vision. Applications for object detection from remote sensing images play an important role in analyzing aerial or satellite imagery. Benefits include applications in monitoring buildings and infrastructure, transportation, supporting search and rescue or responding to natural disasters, and environmental research. However, detecting objects in remote sensing images is difficult due to the diversity of shapes and sizes, viewing angles of objects, and complex background environments. In this paper, the authors present a Deep Learning (DL)-based object detection process from remotely sensed images, the main goal of which is to improve the ability to detect small objects in high-resolution aerial images. Implement and evaluate the super-slicing inference technique in the YOLOv11 model to improve the ability to detect very small and extremely small objects. Many simulation results are tested experimentally in the problem of detecting and tracking vehicles in Vietnam (Thai Nguyen). The results show that the system can accurately detect small objects such as pedestrians, motorbikes, and cars at a distance, with confidence ranging from 0.31 to 0.90. Some detection situations are successful even when the object is located at the edge of the slice. Finally, the authors discuss potential future research directions and unaddressed formulations.
Mainstream deep learning segmentation models are designed for small-sized images, and when applied to high-resolution remote sensing images, the limited information contained in small-sized images greatly restricts a model’s ability … Mainstream deep learning segmentation models are designed for small-sized images, and when applied to high-resolution remote sensing images, the limited information contained in small-sized images greatly restricts a model’s ability to capture complex contextual information at a global scale. To mitigate this challenge, we present RPFusionNet, a novel parallel semantic segmentation framework that is specifically designed to efficiently integrate both local and global features. RPFusionNet leverages two distinct feature representations: REGION (representing large areas) and PATCH (representing smaller regions). This framework comprises two parallel branches: the REGION branch initially downsamples the entire image, then extracts features via a convolutional neural network (CNN)-based encoder, and subsequently captures multi-level information using pooled kernels of varying sizes. This design enables the model to adapt effectively to objects of different scales. In contrast, the PATCH branch utilizes a pixel-level feature extractor to enrich the high-dimensional features of the local region, thereby enhancing the representation of fine-grained details. To model the semantic correlation between the two branches, we have developed the Region–Patch scale fusion module. This module ensures that the network can comprehend a wider range of image contexts while preserving local details, thus bridging the gap between regional and local information. Extensive experiments were conducted on three public datasets: WBDS, AIDS, and Vaihingen. Compared to other state-of-the-art methods, our network achieved the highest accuracy on all three datasets, with an IoU score of 92.08% on the WBDS dataset, 89.99% on the AIDS dataset, and 88.44% on the Vaihingen dataset.
The clever eye (CE) algorithm has been introduced for target detection in remote sensing image processing. It originally proposes the concept of data origin and can achieve the lowest average … The clever eye (CE) algorithm has been introduced for target detection in remote sensing image processing. It originally proposes the concept of data origin and can achieve the lowest average output energy compared to both the classical constrained energy minimization (CEM) and matched filter (MF) methods. In addition, it has been theoretically proven that the solutions of the best data origins can be attributed to solving a linear equation, which makes it computationally efficient. However, CE is only designed for single-target detection cases, while multiple-target detection is more demanding in real applications. In this paper, by naturally extending CE to a multiple-target case, we propose a unified algorithm termed multi-target clever eye (MTCE). The theoretical results in CE prompt us to consider an interesting question: do the MTCE solutions also share a similar structure to those of CE? Aiming to answer this question, we investigate a class of unconstrained non-convex optimization problems, where both the CE and MTCE models serve as special cases, which, interestingly, can also be utilized to solve a more generalized linear system. In addition, we further prove that all these solutions are globally optimal. In this sense, the analytical solutions of this generalized model can be deduced. Therefore, a unified framework is provided to deal with such a non-convex optimization problem, where both the solutions of MTCE and CE can be succinctly derived. Furthermore, its computational complexity is of the same magnitude as that of the other multiple-target-based methods. Experiments on both simulations and real hyperspectral remote sensing data verify our theoretical conclusions, and the comparison of quantitative metrics also demonstrates the advantage of our proposed MTCE method in multiple-target detection.
Oriented object detection in remote sensing images is a particularly challenging task, especially when it involves detecting tiny, densely arranged, or occluded objects. Moreover, such remote sensing images are often … Oriented object detection in remote sensing images is a particularly challenging task, especially when it involves detecting tiny, densely arranged, or occluded objects. Moreover, such remote sensing images are often susceptible to noise, which significantly increases the difficulty of the task. To address these challenges, we introduce the Wavelet-Domain Adaptive Receptive Field Network (WDARFNet), a novel architecture that combines Convolutional Neural Networks (CNNs) with Discrete Wavelet Transform (DWT) to enhance feature extraction and noise robustness. WDARFNet employs DWT to decompose feature maps into four distinct frequency components. Through ablation experiments, we demonstrate that selectively combining specific high-frequency and low-frequency features enhances the network’s representational capacity. Discarding diagonal high-frequency features, which contain significant noise, further enhances the model’s noise robustness. In addition, to capture long-range contextual information and adapt to varying object sizes and occlusions, WDARFNet incorporates a selective kernel mechanism. This strategy dynamically adjusts the receptive field based on the varying shapes of objects, ensuring optimal feature extraction for diverse objects. The streamlined and efficient WDARFNet achieves state-of-the-art performance on three challenging remote sensing object detection benchmarks: DOTA-v1.0, DIOR-R, and HRSC2016.
Hyperspectral unmixing is a fundamental task in remote sensing that aims to decompose mixed spectral pixels into their constituent materials and estimate their respective abundances. Traditional methods often face limitations … Hyperspectral unmixing is a fundamental task in remote sensing that aims to decompose mixed spectral pixels into their constituent materials and estimate their respective abundances. Traditional methods often face limitations due to nonlinear mixing effects and the lack of pure pixels. This paper proposes a deep learning-based comparative framework that integrates modified versions of the Minimum Simplex Convolutional Network (MiSiCNet) and Unsupervised Deep Image Prior (UnDIP), referred to as Blind MiSiCNet and Supervised UnDIP, respectively, to achieve robust unmixing and abundance estimation in real-world scenarios. The proposed Blind MiSiCNet removes downsampling and upsampling layers and leverages spatial and geometric priors through convolutional layers with a minimum simplex volume constraint, producing reliable initial abundance maps even in the absence of pure pixels. The Supervised UnDIP variant further refines these estimates using the implicit regularization of convolutional neural networks, incorporating known endmembers to generate noise-free and spatially coherent abundance maps. Experimental results on the real-world Jasper Ridge dataset demonstrate that the proposed supervised and blind unmixing methods, evaluated comparatively, significantly outperform existing approaches regarding Spectral Angle Distance (SAD) and Root Mean Square Error (RMSE). The results also highlight improved noise robustness and better preservation of spatial structures.
With the continuous development of remote sensing technology and deep learning, change detection methods based on high-resolution remote sensing images are gradually evolving towards intelligence and high precision. Starting from … With the continuous development of remote sensing technology and deep learning, change detection methods based on high-resolution remote sensing images are gradually evolving towards intelligence and high precision. Starting from the theoretical foundation of remote sensing image change detection, this paper systematically comprehends the technical framework and typical models of deep learning, and focuses on the analysis of its application modes in image alignment, feature extraction and bi-phasic analysis. In addition, the integration of multi-source remote sensing data and model adaptation are discussed with the idea of pixel-level and object-level modelling, which provides theoretical basis and methodological support for improving the accuracy and stability of change detection. The study shows that deep learning has a powerful characterisation capability and is an important development direction for change detection in remote sensing images in the future.
Very high resolution (VHR) remote sensing change detection (CD) is crucial for monitoring Earth’s dynamics but faces challenges in capturing fine-grained changes and distinguishing them from pseudo-changes due to varying … Very high resolution (VHR) remote sensing change detection (CD) is crucial for monitoring Earth’s dynamics but faces challenges in capturing fine-grained changes and distinguishing them from pseudo-changes due to varying acquisition conditions. Existing deep learning methods often suffer from information loss via downsampling, obscuring details, and lack filter adaptability to spatial heterogeneity. To address these issues, we introduce Information-Preserving Adaptive Convolutional Network (IPACN). IPACN features a novel Information-Preserving Backbone (IPB), leveraging principles adapted from reversible networks to minimize feature degradation during hierarchical bi-temporal feature extraction, enhancing the preservation of fine spatial details, essential for accurate change delineation. Crucially, IPACN incorporates a Frequency-Adaptive Difference Enhancement Module (FADEM) that applies adaptive filtering, informed by frequency analysis concepts, directly to the bi-temporal difference features. The FADEM dynamically refines change signals based on local spectral characteristics, improving discrimination. This synergistic approach, combining high-fidelity feature preservation (IPB) with adaptive difference refinement (FADEM), yields robust change representations. Comprehensive experiments on benchmark datasets demonstrate that IPACN achieves state-of-the-art performance, showing significant improvements in F1 score and IoU, enhanced boundary delineation, and improved robustness against pseudo-changes, offering an effective solution for very high resolution remote sensing CD.
The talk will wrap up the experiences from the R&amp;D projects that were aimed at benefiting from applying machine learning to real business needs, with a particular focus on emerging … The talk will wrap up the experiences from the R&amp;D projects that were aimed at benefiting from applying machine learning to real business needs, with a particular focus on emerging topics. It will include the image processing and anomaly detection tasks performed using airborne and satellite imagery, as well as sensor and telemetry read-outs.
Synthetic aperture radar (SAR) is now recognized as a critical source of observational data in domains such as military reconnaissance, maritime monitoring, and disaster response, owing to its ability to … Synthetic aperture radar (SAR) is now recognized as a critical source of observational data in domains such as military reconnaissance, maritime monitoring, and disaster response, owing to its ability to deliver fine spatial resolution and broad-area imaging irrespective of weather or daylight conditions [...]
Due to the large dimensionality and few training samples with labels, classifying hyperspectral images (HSIs) is a complex process. We proposed an innovative approach to HSI categorization based on a … Due to the large dimensionality and few training samples with labels, classifying hyperspectral images (HSIs) is a complex process. We proposed an innovative approach to HSI categorization based on a multi-view deep learning approach that integrates spectral and spatial characteristics by utilizing a few labeled instances to overcome these difficulties. To extract a set of spectral and spatial information, we initially process the original HSI. Every spectral vector is the spectrum characteristic of a single picture pixel. A deep Siamese convolutional neural network is employed to extract the spatial features from the data. This network aims to minimize the high dimensionality of the data by accounting for each pixel’s immediate neighborhood. The spectral local texture characteristics are extracted employing the sparse autoencoder, which has a minimal computational burden to create multi-feature vectors. Next, we propose a multi-view approach. The non-subsampled contourlet transform model can be used to combine the spectral and spatial information extracted from the HSI into a single latent representation space. The categories are categorized using a deep learning-based system called CenterNet. Using the datasets from Salinas, WHU-OHS, and Pavia University, the effectiveness of the suggested method is verified. When compared to current state-of-the-art techniques, the comparative experiment results show that the suggested strategy executes better in terms of generalization for small samples and may produce higher categorization accuracy. It presents a novel viewpoint on the processing of HSIs.
Aiming at the problems that the semantic representation of information extracted by the shallow layer of the current remote sensing image scene classification network is insufficient, and that the utilization … Aiming at the problems that the semantic representation of information extracted by the shallow layer of the current remote sensing image scene classification network is insufficient, and that the utilization rate of primary visual features decreases with the deepening of the network layers, this paper designs a multi-scale reverse master–slave encoder network (RMSENet). It proposes a reverse cross-scale supplementation strategy for the slave encoder and a reverse cross-scale fusion strategy for the master encoder. This not only reversely supplements the high-level semantic information extracted by the slave encoder to the shallow layer of the master encoder network in a cross-scale manner but also realizes the cross-scale fusion of features at all stages of the master encoder. A multi-frequency coordinate channel attention mechanism is proposed, which captures the inter-channel interactions of input feature maps while embedding spatial position information and rich frequency information. A multi-scale wavelet self-attention mechanism is proposed, which completes lossless downsampling of input feature maps before self-attention operations. Experiments on open-source datasets RSSCN7, SIRI-WHU, and AID show that the classification accuracies of RMSENet reach 97.41%, 97.61%, and 95.9%, respectively. Compared with current mainstream deep learning models, RMSENet has lower network complexity and excellent classification accuracy.
Rubén Pascual , Christian Ayala , R. Sesma +3 more | International Journal of Applied Earth Observation and Geoinformation
Abstract. Disaster events occur around the world and cause significant damage to human life and property. Earth observation (EO) data enables rapid and comprehensive building damage assessment, an essential capability … Abstract. Disaster events occur around the world and cause significant damage to human life and property. Earth observation (EO) data enables rapid and comprehensive building damage assessment, an essential capability crucial in the aftermath of a disaster to reduce human casualties and inform disaster relief efforts. Recent research focuses on developing artificial intelligence (AI) models to accurately map unseen disaster events, mostly using optical EO data. These solutions based on optical data are limited to clear skies and daylight hours, preventing a prompt response to disasters. Integrating multimodal EO data, particularly combining optical and synthetic aperture radar (SAR) imagery, makes it possible to provide all-weather, day-and-night disaster responses. Despite this potential, the lack of suitable benchmark datasets has constrained the development of robust multimodal AI models. In this paper, we present a Building damage assessment dataset using veRy-hIGH-resoluTion optical and SAR imagery (BRIGHT) to support AI-based all-weather disaster response. To the best of our knowledge, BRIGHT is the first open-access, globally distributed, event-diverse multimodal dataset specifically curated to support AI-based disaster response. It covers five types of natural disasters and two types of human-made disasters across 14 regions worldwide, focusing on developing countries where external assistance is most needed. The dataset's optical and SAR images with spatial resolutions between 0.3 and 1 meters provide detailed representations of individual buildings, making it ideal for precise damage assessment. We train seven advanced AI models on BRIGHT to validate transferability and robustness. Beyond that, it also serves as a challenging benchmark for a variety of tasks in real-world disaster scenarios, including unsupervised domain adaptation, semi-supervised learning, unsupervised multimodal change detection, and unsupervised multimodal image matching. The experimental results serve as baselines to inspire future research and model development. The dataset (Chen et al., 2025), along with the code and pretrained models, is available at https://github.com/ChenHongruixuan/BRIGHT and will be updated as and when a new disaster data is available. BRIGHT also serves as the official dataset for the 2025 IEEE GRSS Data Fusion Contest Track II. We hope that this effort will promote the development of AI-driven methods in support of people in disaster-affected areas.
Remote Sensing Scene Classification (RSSC) is an important and challenging research topic. Transformer-based methods have shown encouraging performance in capturing global dependencies. However, recent studies have revealed that Transformers perform … Remote Sensing Scene Classification (RSSC) is an important and challenging research topic. Transformer-based methods have shown encouraging performance in capturing global dependencies. However, recent studies have revealed that Transformers perform poorly in capturing high frequencies that mainly convey local information. To solve this problem, we propose a novel method based on High-Frequency Enhanced Vision Transformer and Multi-Layer Context Learning (HETMCL), which can effectively learn the comprehensive features of high-frequency and low-frequency information in visual data. First, Convolutional Neural Networks (CNNs) extract low-level spatial structures, and the Adjacent Layer Feature Fusion Module (AFFM) reduces semantic gaps between layers to enhance spatial context. Second, the High-Frequency Information Enhancement Vision Transformer (HFIE) includes a High-to-Low-Frequency Token Mixer (HLFTM), which captures high-frequency details. Finally, the Multi-Layer Context Alignment Attention (MCAA) integrates multi-layer features and contextual relationships. On UCM, AID, and NWPU datasets, HETMCL achieves state-of-the-art OA of 99.76%, 97.32%, and 95.02%, respectively, outperforming existing methods by up to 0.38%.
Land Cover and Land Use studies play an important role in regional socioeconomic development and natural resource management. They support sustainable development by tracking changes in vegetation, freshwater quantity and … Land Cover and Land Use studies play an important role in regional socioeconomic development and natural resource management. They support sustainable development by tracking changes in vegetation, freshwater quantity and quality, land resources, and coastal areas. Iraq's Land Use and Land Cover Monitoring with Remote Sensing Data in the Period 2019–2023. This paper performed land use/land cover LULC type classification and time series analysis using Sentinel-2 satellite imagery for the years 2019 and 2023 to identify changes over time. Remote sensing data is used in this paper to address the challenge of detecting land cover change in Iraq through SVM classification. This goal aims to develop a fundamental method of mapping and monitoring these changes, encouraging sustainable land use practices, and achieving the United Nations Sustainable Development Goals. Land cover classes were categorized into five main types: Water, Barren, Building, Vegetation, and Rangeland. The study showed a marked increase in urbanization, and most of this occurring in previously bare soils at the edges of cities. This urbanization was primarily driven by population growth and economic development. What is beneficial for the environment can also be beneficial for us as people humanity as these findings have major implications for urban planning, green space management, and sustainable city development. It seems that there was no change to the existing barren land and buildings, which increased by 8% and 11% respectively, as noted from the data up to October 2023. However, vegetation coverage decreased by 27%, indicating a significant loss of green area. The water category was also up 9%. Results showed satisfactory accuracy assessment (OA: 93.11%) from applying a Support Vector Machine SVM for the LULC classification. The study lays the foundation for ongoing monitoring of LULC changes in Iraq.
To effectively utilize the rich spectral information of hyperspectral remote sensing images (HRSIs), the fractional Fourier transform (FRFT) feature of HRSIs is proposed to reflect the time-domain and frequency-domain characteristics … To effectively utilize the rich spectral information of hyperspectral remote sensing images (HRSIs), the fractional Fourier transform (FRFT) feature of HRSIs is proposed to reflect the time-domain and frequency-domain characteristics of a spectral pixel simultaneously, and an FRFT order selection criterion is also proposed based on maximizing separability. Firstly, FRFT is applied to the spectral pixels, and the amplitude spectrum is taken as the FRFT feature of HRSIs. The FRFT feature is mixed with the pixel spectral to form the presented spectral and fractional Fourier transform mixed feature (SF2MF), which contains time–frequency mixing information and spectral information of pixels. K-nearest neighbor, logistic regression, and random forest classifiers are used to verify the superiority of the proposed feature. A 1-dimensional convolutional neural network (1D-CNN) and a two-branch CNN network (Two-CNNSF2MF-Spa) are designed to extract the depth SF2MF feature and the SF2MF-spatial joint feature, respectively. Moreover, to compensate for the defect that CNN cannot effectively capture the long-range features of spectral pixels, a long short-term memory (LSTM) network is introduced to be combined with CNN to form a two-branch network C-CLSTMSF2MF for extracting deeper and more efficient fusion features. A 3D-CNNSF2MF model is designed, which firstly performs the principal component analysis on the spa-SF2MF cube containing spatial information and then feeds it into the 3-dimensional convolutional neural network 3D-CNNSF2MF to extract the SF2MF-spatial joint feature effectively. The experimental results of three real HRSIs show that the presented mixed feature SF2MF can effectively improve classification accuracy.
Heterogeneous change detection refers to using image data from different sensors or modalities to detect change information in the same region by comparing images of the same region at different … Heterogeneous change detection refers to using image data from different sensors or modalities to detect change information in the same region by comparing images of the same region at different time periods. In recent years, methods based on deep learning and domain adaptation have become mainstream, which can effectively improve the accuracy and robustness of heterogeneous image change detection through feature alignment and multimodal data fusion. However, a lack of credible labels has stopped most current learning-based heterogeneous change detection methods from being put into application. To overcome this limitation, a weakly supervised heterogeneous change detection framework with a structure similarity-guided sample generating (S3G2) strategy is proposed, which employs differential structure similarity to acquire prior information for iteratively generating reliable pseudo-labels. Moreover, a Statistical Difference representation Transformer (SDFormer) is proposed to lower the influence of modality difference between bitemporal heterogeneous imagery and better extract relevant change information. Extensive experiments have been carried out to fully investigate the influences of inner manual parameters and compare them with state-of-the-art methods in several public heterogeneous change detection data sets. The experimental results indicate that the proposed methods have shown competitive performance.
Land Use and Land Cover (LULC) classificationplays a pivotal role in a range of applications,including urban planning, environmentalmonitoring, agricultural management, and climatechange analysis. With the rapid advancement insatellite and aerial … Land Use and Land Cover (LULC) classificationplays a pivotal role in a range of applications,including urban planning, environmentalmonitoring, agricultural management, and climatechange analysis. With the rapid advancement insatellite and aerial remote sensing technologies,vast volumes of high-resolution multispectral andhyperspectral imagery are now accessible. Whiledeep learning has emerged as a powerful tool forautomating LULC classification with highaccuracy, its black-box nature poses significantchallenges to transparency, trustworthiness, andadoption in critical domains. This researchproposes an interpretable deep learningframework that not only delivers accurate LULCclassification but also ensures modelexplainability through the use of SHAP (SHapleyAdditive exPlanations), a game-theoreticapproach to interpreting machine learningmodels.The proposed framework integrates aconvolutional neural network (CNN)-basedarchitecture trained on satellite imagery toclassify different land cover types such asvegetation, water bodies, built-up areas, barrenland, and agricultural fields. The CNN is trainedusing a labeled remote sensing dataset, such asSentinel-2 or Landsat 8 imagery, withpreprocessing steps including radiometriccalibration, normalization, and augmentation tohandle class imbalances and improvegeneralization.
Yifan Wang , Fan Zhang , Qihao Zhao +2 more | International Journal of Applied Earth Observation and Geoinformation
In recent years, the continuous development of deep learning has significantly advanced its application in the field of remote sensing. However, the semantic segmentation of high-resolution remote sensing images remains … In recent years, the continuous development of deep learning has significantly advanced its application in the field of remote sensing. However, the semantic segmentation of high-resolution remote sensing images remains challenging due to the presence of multi-scale objects and intricate spatial details, often leading to the loss of critical information during segmentation. To address this issue and enable fast and accurate segmentation of remote sensing images, we made improvements based on SegNet and named the enhanced model CSNet. CSNet is built upon the SegNet architecture and incorporates a coordinate attention (CA) mechanism, which enables the network to focus on salient features and capture global spatial information, thereby improving segmentation accuracy and facilitating the recovery of spatial structures. Furthermore, skip connections are introduced between the encoder and decoder to directly transfer low-level features to the decoder. This promotes the fusion of semantic information at different levels, enhances the recovery of fine-grained details, and optimizes the gradient flow during training, effectively mitigating the vanishing gradient problem and improving training efficiency. Additionally, a hybrid loss function combining weighted cross-entropy and Dice loss is employed. To address the issue of class imbalance, several categories within the dataset are merged, and samples with an excessively high proportion of background pixels are removed. These strategies significantly enhance the segmentation performance, particularly for small-sample classes. Experimental results from the Five-Billion-Pixels dataset demonstrate that, while introducing only a modest increase in parameters compared to SegNet, CSNet achieves superior segmentation performance in terms of overall classification accuracy, boundary delineation, and detail preservation, outperforming established methods such as U-Net, FCN, DeepLabv3+, SegNet, ViT, HRNe and BiFormert.
The accurate identification of water bodies in hyperspectral images (HSIs) remains challenging due to hierarchical representation imbalances in deep learning models, where shallow layers overly focus on spectral features, boundary … The accurate identification of water bodies in hyperspectral images (HSIs) remains challenging due to hierarchical representation imbalances in deep learning models, where shallow layers overly focus on spectral features, boundary ambiguities caused by the relatively low spatial resolution of satellite imagery, and limited detection capability for small-scale aquatic features such as narrow rivers. To address these challenges, this study proposes Heuristic–Adaptive Spectral–Spatial Neural Architecture Search with Dynamic Cell Evaluation (HASSDE-NAS). The architecture integrates three specialized units; a spectral-aware dynamic band selection cell suppresses redundant spectral bands, while a geometry-enhanced edge attention cell refines fragmented spatial boundaries. Additionally, a bidirectional fusion alignment cell jointly optimizes spectral and spatial dependencies. A heuristic cell search algorithm optimizes the network architecture through architecture stability, feature diversity, and gradient sensitivity analysis, which improves search efficiency and model robustness. Evaluated on the Gaofen-5 datasets from the Guangdong and Henan regions, HASSDE-NAS achieves overall accuracies of 92.61% and 96%, respectively. This approach outperforms existing methods in delineating narrow river systems and resolving water bodies with weak spectral contrast under complex backgrounds, such as vegetation or cloud shadows. By adaptively prioritizing task-relevant features, the framework provides an interpretable solution for hydrological monitoring and advances neural architecture search in intelligent remote sensing.
Variations in facial complexion serve as a telltale sign of underlying health conditions. Precisely categorizing facial complexions poses a significant challenge due to the subtle distinctions in facial features. Three … Variations in facial complexion serve as a telltale sign of underlying health conditions. Precisely categorizing facial complexions poses a significant challenge due to the subtle distinctions in facial features. Three multi-feature facial complexion classification algorithms leveraging convolutional neural networks (CNNs) are proposed. They fuse, splice, or independently train the features extracted from distinct facial regions of interest (ROI), respectively. Innovative frameworks of the three algorithms can more effectively exploit facial features, improving the utilization rate of feature information and classification performance. We trained and validated the three algorithms on the dataset consisting of 721 facial images that we had collected and preprocessed. The comprehensive evaluation reveals that multi-feature fusion and splicing classification algorithms achieve accuracies of 95.98% and 93.76%, respectively. The optimal approach combining multi-feature CNN with machine learning algorithms attains a remarkable accuracy of 97.78%. Additionally, these experiments proved that the multidomain combination was crucial, and the arrangement of ROI features, including the nose, forehead, philtrum, and right and left cheek, was the optimal choice for classification. Furthermore, we employed the EfficientNet model for training on the face image as a whole, which achieves a classification accuracy of 89.37%. The difference in accuracy underscores the superiority and efficacy of multi-feature classification algorithms. The employment of multi-feature fusion algorithms in facial complexion classification holds substantial advantages, ushering in fresh research directions in the field of facial complexion classification and deep learning.
<title>Abstract</title> In recent years, the frequency and intensity of natural disasters have increased worldwide. These disasters cause significant economic losses, with building damage accounting for a substantial portion. Post disaster … <title>Abstract</title> In recent years, the frequency and intensity of natural disasters have increased worldwide. These disasters cause significant economic losses, with building damage accounting for a substantial portion. Post disaster response plans involve inventories of building damage. However, these assessments can be highly subjective, require substantial time, and can expose the inspectors to unsafe environments. Automation of building damage assessment by applying deep learning combined with advanced remote sensing technology is currently an active research topic. These efforts are hindered by the limited amount of high-quality training datasets for each disaster type (e.g., hurricane, wildfire). Buildings damaged by different disasters may show distinct damage patterns due to differing damage mechanisms, posing challenges to data integration across disaster types and model development. To investigate these issues, this study explores the interrelationship between wildfire and hurricane data by developing models suited to wildfire and hurricane datasets both individually and jointly as well as combining various backbones and deep learning models. Our approach includes semantic segmentation for pixel-level damage assessment and analyzing model sensitivity with different amounts of training data. Ultimately, this study provides a solution to the limited data available to train building damage assessment deep learning models by providing a comparative analysis of the inter-applicability of wildfire and hurricane data. A notable finding is that when using a small portion of data through transfer learning, data and deep learning models from the other disaster types can be leveraged.
Satellite cloud images exhibit complex multidimensional characteristics, including spectral, textural, and spatiotemporal dynamics. The temporal evolution of cloud systems plays a crucial role in accurate classification, particularly under the coexistence … Satellite cloud images exhibit complex multidimensional characteristics, including spectral, textural, and spatiotemporal dynamics. The temporal evolution of cloud systems plays a crucial role in accurate classification, particularly under the coexistence of multiple weather systems. However, most existing models—such as those based on convolutional neural networks (CNNs), Transformer architectures, and their variants like Swin Transformer—primarily focus on spatial modeling of static images and do not explicitly incorporate temporal information, thereby limiting their ability to effectively integrate spatiotemporal features. To address this limitation, we propose SIG-ShapeFormer, a novel classification model specifically designed for satellite cloud images with temporal continuity. To the best of our knowledge, this work is the first to transform satellite cloud data into multivariate time series and introduce a unified framework for multi-scale and multimodal feature fusion. SIG-Shapeformer consists of three core components: (1) a Shapelet-based module that captures discriminative and interpretable local temporal patterns; (2) a multi-scale Inception module combining 1D convolutions and Transformer encoders to extract temporal features across different scales; and (3) a differentially enhanced Gramian Angular Summation Field (GASF) module that converts time series into 2D texture representations, significantly improving the recognition of cloud internal structures. Experimental results demonstrate that SIG-ShapeFormer achieves a classification accuracy of 99.36% on the LSCIDMR-S dataset, outperforming the original ShapeFormer by 2.2% and outperforming other CNN- or Transformer-based models. Moreover, the model exhibits strong generalization performance on the UCM remote sensing dataset and several benchmark tasks from the UEA time-series archive. SIG-Shapeformer is particularly suitable for remote sensing applications involving continuous temporal sequences, such as extreme weather warnings and dynamic cloud system monitoring. However, it relies on temporally coherent input data and may perform suboptimally when applied to datasets with limited or irregular temporal resolution.