Chapter III 
Classifying segmented hyperspectral data from a heterogeneous urban environment

↓50

Journal of Applied Remote Sensing 1 (2007) 013543 Sebastian van der Linden, Andreas Janz, Björn Waske, Michael Eiden and Patrick Hostert © 2007 Society of Photo-Optical Instrumentation Engineers doi: 10.1117/1.2813466 received 28 March 2007; revised 15 October 2007; accepted 16 October 2007. This paper was published in Journal of Applied Remote Sensing and is made available as an electronic reprint with permission of SPIE. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.

↓51

Abstract

Classifying remotely sensed images from urban environments is challenging. Urban land cover classes are spectrally heterogeneous and materials from different classes have similar spectral properties. Image segmentation has become a common preprocessing step that helped to overcome such problems. However, little attention has been paid to impacts of segmentation on the data’s spectral information content. Here, urban hyperspectral data is spectrally classified using support vector machines (SVM). By training a SVM on pixel information and applying it to the image before segmentation and after segmentation at different levels, the classification framework is maintained and the influence of the spectral generalization during image segmentation hence directly investigated. In addition, a straightforward multi-level approach was performed, which combines information from different levels into one final map. A stratified accuracy assessment by urban structure types is applied. The classification of the unsegmented data achieves an overall accuracy of 88.7%. Accuracy of the segment-based classification is lower and decreases with increasing segment size. Highest accuracies for the different urban structure types are achieved at varying segmentation levels. The accuracy of the multi-level approach is similar to that of unsegmented data but comprises the positive effects of more homogeneous segment-based classifications at different levels in one map.

Chapter III:1 Introduction

The number of remote sensing applications in urban areas has significantly increased over the past years (Roessner et al., 2001;Benediktsson et al., 2005;Small et al., 2005;Lu and Weng, 2006). This development is mainly driven by two factors: at first, the rapid global urbanization process raises the demand for time and cost effective space- and airborne monitoring (UN, 2006). At second, the spatial resolution of recently available remote sensing imagery allows an accurate representation of urban structures (Small, 2003;Bruzzone and Carlin, 2006). Despite the fine spatial resolution of new sensors, urban areas are still challenging to be studied with remotely sensed data. Spectral properties of the urban environment influence the performance of image analyses like the classification of land cover types. The number of surface materials and hence the spectral heterogeneity in urban imagery is very high and spectrally similar materials might occur on different surface types, e.g. tar roofs and asphalt roads (Herold et al., 2004). Common multispectral sensor configurations as used for IKONOS or Landsat Thematic Mapper are not sufficient to differentiate such urban categories. In addition, the high frequent spatial patterns of urban reflectance suggest a relatively high number of mixed pixels (Small, 2003). A detailed classification of urban areas thus requires data of high spectral and spatial resolution, as for example provided by airborne imaging spectrometers like the Hyperspectral Mapper (HyMap).

↓52

Various authors improve urban classifications of spectrally ambiguous surfaces by using additional information like census data on population density (Lu and Weng, 2006), LiDAR information on surface structure (Hodgson et al., 2003), or texture measures like extended morphological profiles (Benediktsson et al., 2005). In Shackelford and Davis (2003), results of a pixel-based spectral classification are combined with the classification of a segmented image that bases upon information on the segments’ shapes and neighborhoods; in doing so, a higher accuracy for typical urban classes like buildings, roads and other impervious surfaces is achieved. The successful incorporation of segment properties into the classification of urban areas is described by various authors (Damm et al., 2005;Bruzzone and Carlin, 2006;Diermayer et al., 2006;Schöpfer and Moeller, 2006).

In the context of segmentation-based analysis, so far only little attention has been paid to its influence on spectral image information: during the segmentation process segments are assigned the mean spectral value of their constituent pixels as primary spectral information. The effect of this spectral generalization can hardly be predicted. On the one hand noise or unwanted detail will be eliminated, but on the other hand important spectral information might be removed. To directly investigate the influence of the spectral generalization, an image needs to be classified with and without prior segmentation, while all other basic conditions are maintained, i.e. the same classifier with identical training must be used. Most studies, however, compare pixel- and segment-based classifications of optical images under varying conditions, either by using different classifiers (Wang et al., 2004) or by incorporating additional features into the segment-based classification (Song et al., 2005;Bruzzone and Carlin, 2006).

The present work investigates the impacts of image segmentation on the purely spectral classification of a hyperspectral data set from a large heterogeneous urban environment. The influence of image segmentation is assessed at different scales and for different urban structure types. The conceptual framework for this investigation and the classification approach are described in Section 2. Section 3 explains the image and training data as well as the methods used for segmentation and classification. Results of the experiments are shown in Section 4 and discussed in Section 5. The paper ends with concluding remarks in Section 6.

Chapter III:2 Conceptual framework

Chapter III:2.1 Segment-based classification

↓53

From a spectral perspective, image segmentation, especially region-growing approaches, can be considered a locally optimized generalization procedure: adjacent pixels from presumably homogeneous areas are merged into image segments. The original spectral information is reduced to a mean value, which is then assigned to the corresponding image segment. Ideally, segment outlines follow boundaries of natural objects and possible spectral heterogeneity within this object is intentionally eliminated. This has an important impact on subsequent processing, e.g. the classification of the data: the spectral feature space is modified by averaging the spectral information from adjacent pixels, and classification results are different. If segment outlines match those of natural objects, the pixels’ original spectral information is changed towards values that are more representative for the object as a whole and presumably its class. The confusion between overlapping classes will decrease, produced maps appear more homogeneous and are easier to perceive. The positive effect is weakened when segments are smaller than natural objects, but – more important – it turns into a disadvantage when segments are too large and include pixels that belong to adjacent natural objects from different classes. In this case, spectral values from different classes are averaged, i.e. confusion increases. The following possible disadvantages of segment-based classification are summarized in (Song et al., 2005): an inaccurate segmentation will not improve classification, the classification error is accumulated due to errors in segmentation and classification, and the misclassification of a segment means a misclassification of all pixels of the object. Simultaneously to the generalization of the segments’ spectral properties, segment specific features are generated, e.g. shape features, textural information, and relationships between segments. The availability of this additional information is an important advantage of many segment-based approaches over pixel-based classifications (Damm et al., 2005;Bruzzone and Carlin, 2006). Regardless of this advantage, however, it is desirable to make best use of the spectral information in segment-based approaches. The airborne hyperspectral data from a heterogeneous urban environment as used in this work is ideal to investigate the spectral properties of segmented image data: the high spectral information content promises high classification accuracy, even for critical land cover classes like built-up and not built-up impervious surfaces. At a spatial resolution of 4 m the quality of the image segmentation is expected to be influenced by mixed pixels, especially in areas with small characteristic spatial scales and high local spectral variance (Woodcock and Strahler, 1987;Small, 2003). Spectral similarity between adjacent objects from different classes will additionally complicate the analysis. Moreover, the various urban structure types in the data show very different spatial properties and patterns, and they are not assumed to be well represented by a single segmentation level.

The segment-based classification in this work is set up to directly investigate the influence of spectral generalization. At first, the unsegmented image is classified using a SVM classifier (Fig. III-1, left). Then, segmented images with different levels of aggregation are individually classified using the SVM that was previously trained on the pixel information of the unsegmented image (Fig. III-1, center). This way, the differences between the two approaches are reduced to the data to be classified and the effects of image segmentation. Following this analysis of individual segmentation levels, a multi-level approach is performed to test whether positive impacts of segment-based classification at varying levels can be combined into one map by a multi-level classification (Fig. III-1, right). Intermediate results, i.e. rule images, of the previous SVM classifications are combined and a single map is derived (for details see Section 2.2). This straightforward approach does not require a supervised training at different scales or the definition of relationships between different segmentation levels by the user.

Figure III-1: Flowchart of the pixel-based (left), segment-based (center), and multi-level approach (right). The SVM for both the pixel- and segment-based approach were trained on pixel data. For details on SVM classification see Section 2.2.

Chapter III:2.2 Support vector machine classification

↓54

A spectral classification approach for heterogeneous urban environments should fulfill two requirements: (1) the chosen algorithm has to be capable of describing multi-modal classes, i.e. heterogeneous classes including more than one cluster in the feature space; (2) the classification of the smooth transition zones between classes is critical, due to the ambiguity of the classes’ spectral information and the high number of mixed pixels. Over the past two decades, a variety of non-parametric classifiers has been introduced into remote sensing image analysis, e.g. artificial neural networks (Benediktsson et al., 1990), decision tree classifiers (Friedl and Brodley, 1997), and support vector machines (SVM) (Huang et al., 2002;Foody and Mathur, 2004). These do not assume specific class distributions and are thus well suited for complex environments or approaches using fused data sets. SVM are one of the more recent developments in the field of machine learning. They outperformed other approaches under varying conditions in the very most cases or performed at least equally well (Huang et al., 2002;Foody and Mathur, 2004;Melgani and Bruzzone, 2004;Pal and Mather, 2006). In particular, SVM have been shown to be insensitive to high data dimensionality and robust in terms of small training sample sizes (Melgani and Bruzzone, 2004;Pal and Mather, 2006).

SVM delineate two classes by fitting an optimal separating hyperplane to the training data in the d-dimensional feature space (Vapnik, 1998). They are based on structural risk minimization: a hyperplane is optimal when it minimizes a cost function that expresses a combination of (1) maximizing the margin, i.e. the distance between the hyperplane and the closest training samples, and (2) minimizing the error on training samples that can not be separated (Bruzzone and Carlin, 2006). The influence of the non separable samples is controlled by a regularization parameter C. For linearly not separable cases, the input data are implicitly mapped into a higher dimensional space by a kernel function, e.g. Gaussian radial basis function (RBF). Explicitly, the kernel function is integrated into the optimization of the cost function in a way that only dot products between sample vectors in the high dimensional space are computed. The parameters of the kernel function are chosen to allow the best possible fitting of the hyperplane. For the RBF kernel this is the parameter γ that controls the width of the Gaussian function. A detailed description on the concept of SVM and the formulation of the problem is given in Burges (1998), comprehensive introductions in a remote sensing context in Huang et al. (2002), Foody and Mathur (2004), and Melgani and Bruzzone (2004).

Two main strategies exist to solve multi-class problems with originally binary SVM: the one-against-one (OAO) and the one-against-all strategy (OAA) (Huang et al., 2002;Foody and Mathur, 2004). Additional approaches are described in Melgani and Bruzzone (2004) and Hsu and Lin (2002). In this work the OAA approach was preferred, since first tests showed no significant differences to other approaches in terms of accuracy. In addition, the suggested multi-level classification could easily be performed on intermediate results of the OAA strategy: SVM produce an image that shows the distance of pixels to the separating hyperplanes for each binary problem. In the OAA approach, a set of such images, from now on referred to as rule images in analogy to other classifiers, is generated to individually separate each class from the remaining ones, e.g. vegetation from the rest. The final class label is then determined by comparing the values in the rule images and selecting the maximum value. For the multi-level approach, rule images from different segmentation levels that were generated during the segment-based classification were averaged for each binary case. The final map was derived by applying the maximum value decision to resulting mean values of each OAA case (Fig. III-1, right). The success of a combined use of SVM rule images for data fusion has previously been demonstrated for multi-sensoral data (Waske and Benediktsson, 2007).

↓55

The classifications of this work target to map five typical urban land cover categories: vegetation, built-up areas, non built-up impervious areas, non-vegetated pervious areas and water. Especially built-up areas include all kinds of roof materials at different illumination conditions, in parts being specular reflectors, and hence show a multi-modal spectral distribution. Non built-up impervious surfaces comprise all other artificial surfaces like roads, sidewalks, other open spaces, plus cars, railroad tracks, or trains. By defining such spectrally heterogeneous classes the ability of SVM to delineate complex class distributions is tested.

Chapter III:2.3 Stratified accuracy assessment of the support vector machine classifications

The classification accuracy is expected to vary between different urban structure types due to the unequal distribution of phenomena like shadow, the portion of mixed pixels depending on the average size of spatial structures, or the abundance of the spectrally more distinct classes vegetation and water. For a thorough validation of the SVM classification in urban areas with heterogeneous structural composition, map accuracy has to be assessed following an adapted strategy. Thus, urban structure types like the central business district, industrial and commercial grounds, residential areas of different densities, and suburban areas will be stratified and individually validated.

When segmenting data that comprises different urban structure types, the quality of segmentation results can be expected to vary as a consequence of variations in the size of natural objects and the spectral contrast to adjacent surfaces. Possible positive or negative impacts will more than likely occur simultaneously. Thus, the stratified accuracy assessment of the segment-based classifications functions as an indirect measure of segmentation quality for the corresponding regions in the image. This way, general information on appropriate average segment sizes for the analysis of different urban areas shall be derived. By reducing the description of segmentation results to the value of overall average segment size, a flexible measure is used that is independent from the segmentation algorithm and that might also function as a guideline for the work with data sets from other urban areas.

Chapter III:3 Material and methods

Chapter III:3.1 HyMap imagery and data preprocessing

↓56

The airborne imaging spectrometer HyMap acquires data between 0.4 and 2.5 µm in 128 spectral bands. Its spatial resolution is 3.9 by 4.5 m at nadir, when operated at 1,930 m. The sensor’s field-of-view (FOV) is at 61.3°. The HyMap flight line that is classified in the present work was acquired over Berlin, Germany on 20 June, 2005 around 10.46 am central European summer time. The flight direction was East-West at 256°; the center of the 512 by 7,277 pixel scene is located at E 392254 and N 5820441 in UTM zone 33. An area of 32.5 by 2.2 km is covered including a great variety of urban structure types: the governmental district, residential areas of different densities and ages, recreational areas, suburban areas towards the city’s borders, industrial grounds, as well as large apartment complexes and wide boulevards from socialist time. In addition, agricultural areas, forest patches and water bodies are present.

The data set was corrected for atmospheric effects and transferred to reflectance values (Richter and Schläpfer, 2002). The number of bands was reduced to 114 based on the signal-to-noise ratio. View-angle dependent brightness gradients that are caused by anisotropic surface reflectance were eliminated following an approach for urban hyperspectral data (Schiefer et al., 2006; Chapter II of this work). Geometric correction was not performed to avoid spatial resampling and the interpolation of spectral information.

Chapter III:3.2 Image segmentation 

Image segmentation was performed using the region merging approach suggested in Baatz and Schaepe (Baatz and Schaepe, 2000). Despite other region-growing approaches (Evans et al., 2002) or edge-delineation approaches (Rydberg and Borgefors, 2001), this approach is most frequently used in remote sensing (e.g.Hodgson et al., 2003;Shackelford and Davis, 2003). A detailed description of the underlying formulae can be found in Bruzzone and Carlin (2006). In general, the spectral variance within user-defined bands and compactness or smoothness of generated segments controls the termination of the segmentation process. In this work segment shape was not utilized and only spectral information was used. This is in accord with the focus of the analysis in this work. The segmentation was performed on the first 20 principal components, since segmentation of all 114 spectral bands was not feasible. Segment outlines were then transferred onto the original spectral data. Ten segmented images with average segment sizes between 2.4 and 21.4 pixels were generated using increasing values for the termination criterion (Fig. III-2).

↓57

The spectral information from the segmented images was stored in a generic band sequential file format, where every pixel corresponds to a segment. The band values of each segment represent the average spectral information of its constituent pixels in the respective band. The segments are stored sequentially according to an index number they receive during the segmentation process. In order to re-localize the segments after image processing a separate file with the spatial positions of the indices is used. The generic format enables a software-independent processing of the spectral information derived from the segmentation process. At the same time, the physical data size is reduced during spectral generalization, i.e. spectrally compressed, and processing speed hence increased – an important, but so far neglected side effect of segment-based analysis, especially in the case of large data volumes.

Figure III-2: Five subsets from the HyMap image before segmentation and data at average segment sizes of 3.4, 8.5, 13.1 (top to bottom; R = 829 nm; G = 1648 nm; B = 662 nm).

Chapter III:3.3 Support vector machines

The training of the SVM was performed using the C-SVM approach in LIBSVM (Chen and Lin, 2001). An RBF kernel was used to transform the data (Vapnik, 1998). An in-house implementation of LIBSVM for remote sensing data was used to train wide ranges of values for γ and C and evaluate the quality based on a 4-fold cross validation (Janz et al., 2007). This way, optimal parameters could be found for the binary OAA classifiers and an over-fitting to the training data was avoided.

Chapter III:3.4 Training and validation data

↓58

The sampling strategy of the present work rather focuses on the description of the transition zones between classes than on homogeneous areas. Since spectrally heterogeneous urban classes are too manifold to generate artificial mixtures as in Foody and Mathur (2006), a clustered sampling strategy was performed: at first, 64 seed pixels were randomly drawn from the full image. 29 pixels around each of these seeds (5 x 5 windows plus the four outer diagonal pixels) were then assigned to one of the five land cover classes. A smaller number of additional seed pixels were interactively placed on rare but characteristic surfaces, which were not present in the randomly selected data set. These included unweathered asphalt, very bright parking lots, impervious sports fields with artificial lawn or tartan, soil surfaces at both construction areas and on fallow land, plus rare roof materials. All pixels were labeled based on very high resolution aerial photographs. Most of the clusters contained at least one class boundary and thus mixed pixels of two or more classes. By sampling adjacent pixels, mixed pixels were usually sampled along with corresponding purer pixels. This way, the transition zones between two classes were represented by sets of pixels describing a gradient from pure over differently mixed to, again, pure pixels, and the position of the hyperplane was narrowed down at several positions in spectral feature space. The original number of pixels from vegetated areas – the most frequent surface type – was randomly reduced. This way, the proportion of the three main classes vegetation, impervious and built-up was more balanced and training times decreased, while the accuracy for vegetation was expected to remain good (Table III-1). The overall number of 2133 training pixels corresponds to 0.057% of all pixels.

Table III-1: Distribution of training pixels by classes.

Class

vegetation

built-up

impervious

pervious

water

Total

No. training pixels

631

564

556

266

116

2133

For statistical validation 1253 independent reference pixels were selected from the HyMap image and assigned to one of the five land cover classes based on very high resolution aerial photographs. The sampling was not purely random to investigate the classification accuracy with regard to different urban structure types. Rectangular polygons of about 200 by 300 pixels were manually drawn in homogeneous areas of six typical urban structure types, including: the city centre with business areas and the governmental district (center); dense residential areas with attached buildings and narrow courtyards (dense); open residential areas with private gardens (single); pre-cast apartment complexes surrounded by recreational areas (complexes); individual houses surrounded by agricultural patches and forest along the urban-suburban fringe (suburban); industrial and commercial grounds (industrial). About 150 reference pixels were randomly drawn from each urban structure type. Each structure type is characterized by different class proportions (Table III-2). To better investigate the classification quality in dark areas, 158 extra points were randomly selected using a dark area mask (reflectance at 1.650 µm < 5%). In addition, 197 pixels were randomly selected from the rest of the image, to represent remaining areas.

↓59

Table III-2: Reference pixels of the five land cover classes as distributed over the urban structure type.

Class

Reference pixels randomly selected from

Total

center

dense

single

complexes

suburban

industrial

dark

rest

vegetation

28

52

91

77

112

42

55

108

565

built-up

58

41

36

23

5

29

-

32

224

impervious

57

54

18

40

2

62

42

34

309

pervious

1

4

4

7

26

18

-

12

72

water

4

-

-

2

5

-

61

11

83

Total

148

151

149

149

150

151

158

197

1253

Chapter III:4 Results

Chapter III:4.1 SVM classification of pixels

The SVM classification of the original image leads to an overall accuracy of 88.7% and a kappa coefficient (κ) of 0.84 using all 1,253 reference pixels. Slightly more than half of the area is classified as vegetation (52.7%). 22.3% of the area are impervious grounds, 16.2% built-up areas. Pervious and water are the smallest classes at 4.8% and 3.9%, respectively. The accuracy assessment shows different degrees of confusion between the classes (Table III-3).

Table III-3: Confusion matrix including producer’s/user’s accuracy [%] of pixel-based SVM classification.

Image pixels

Reference pixels

Total

User’s accuracy

vegetation

built-up

impervious

pervious

water

vegetation

541

4

5

7

2

559

96.8

built-up

0

183

24

5

0

212

86.3

impervious

20

33

270

21

3

347

77.8

pervious

4

4

6

39

0

53

73.6

water

0

0

4

0

78

82

95.1

Total

565

224

309

72

83

1253

Producer's acc.

95.8

81.7

87.4

54.2

94.0

↓60

Based on the stratified set of reference pixels, the quality of the SVM classification was evaluated for different urban structure types (Table III-4). The accuracy of vegetation is lowest in dense residential areas, where many trees are located in dark courtyards or along streets in the shadow behind houses. The class built-up shows producer’s accuracies of about 85% or higher for the city center, dense residential areas, and apartment complexes. The lowest value exists for industrial areas at 72.4%. The accuracies of impervious areas are nowhere below 75% and reach 95% for areas with apartment complexes. Looking at the overall values, accuracies of more than 80% are achieved for all structure types. The accuracies of single residential, apartment complexes and suburban, i.e. areas with great proportions of vegetated areas, are highest.

Table III-4: Producer’s and user’s accuracies [%] of vegetation, built-up, impervious, and pervious and the overall accuracy by urban structure types in the pixel-based approach. Values for n < 20 are not shown.

Class

Urban structure type

center

dense

single

complexes

suburban

industrial

vegetation

89.3/100

80.8/95.5

97.8/98.9

100/97.5

99.1/95.7

100/95.5

built-up

84.5/81.7

85.4/89.7

77.8/87.5

87.0/90.0

-

72.4/80.8

impervious

77.2/80.0

90.4/75.8

-

95.0/88.4

-

85.5/77.9

pervious

-

-

-

-

80.8/95.5

-

Overall

83.1

84.6

89.3

94

94.7

80.8

Chapter III:4.2 SVM classification of segments

At first, the results from the segment-based approach are assessed by a visual comparison of maps based on the pixel image and the segmented data (Fig. III-3). In general, segment-based maps appear more homogeneous, but misclassified segments result in an areal misclassification of a group of pixels and some necessary spatial detail is removed. A typical example of the effects of image segmentation can be seen in the case of a sports gym (Fig. III-3.a). The building is best represented at an average segment size of 8.5 or 13.1, whereas surrounding patterns of paved paths and vegetated as well as non-vegetated patches result in large, mainly misclassified areas at these levels. Within residential areas, the fragmented patches of built-up and impervious pixels disappear in the segment-based maps (Fig. III-3.b). At the same time, the increasing segment size leads to the misclassification of groups of built-up pixels, especially next to shadowed areas and bright facades, which are visible at large view-angles. Cars are often classified built-up at pixel level, because of their similarity to metal roofs. This phenomenon can for example be observed on parking lots (Fig. III-3.c). At increasing segment sizes this heterogeneous area is spatially and hence spectrally generalized and the area is uniformly classified impervious. In the case of small trees on impervious areas a similar effect exists, which is not necessarily intended by the generalization (Fig. III-3.d).

↓61

Figure III-3: Subsets from classified data at different levels. Pixel level, segment sizes of 3.4, 8.5, 13.1, and the multi-level classification are displayed (top to bottom).

Table III-5: Accuracies of segment-based classifications and multi-level approach by urban structure types. The highest accuracy of each region is indicated by bold numbers.

Avg. segment size

center

dense

single

complexes

suburban

industrial

Overall

pixel

83.1

84.6

89.3

94.0

94.7

80.8

88.7

2.4

85.1

82.0

88.6

91.2

96.0

78.2

87.6

3.4

85.1

82.0

87.9

91.2

97.3

76.8

87.6

4.8

83.1

78.7

85.2

89.9

94.7

76.2

85.4

6.5

83.1

77.3

82.6

87.3

94.0

75.5

84.1

8.5

83.1

76.8

82.6

84.6

94.0

75.5

83.3

10.7

83.8

74.8

83.9

85.2

95.3

74.8

83.4

13.1

84.5

75.5

85.2

85.2

94.7

71.5

83.2

15.7

84.5

74.2

83.2

84.6

95.3

72.2

83.1

18.5

83.1

74.2

82.6

83.2

94.7

72.8

82.4

21.5

82.4

73.5

81.2

83.2

94.7

71.5

81.8

multi-level

85.8

83.4

87.2

91.3

96.7

78.1

87.8

The positive impression from the more homogeneous segment-based maps is not confirmed by the statistical accuracy assessment. A decrease of 1.1% from pixels to smallest segments followed by constantly decreasing accuracy with increasing segment sizes can be observed (Table III-5). The best segment-based overall accuracy of 87.2% (κ = 0.82) is achieved at average segment sizes of 2.4 and 3.4 pixels. From segment size 4.8 onwards, the increasing difference to the pixel-based result becomes greater 5% and is significant at the 95% level of confidence (e.g. Z = 3.79 for size 4.8) based on a McNemar test (Foody, 2004).

↓62

A detailed assessment of the overall accuracy at different aggregation levels shows different developments between the urban structure types, but also some similarities (Table III-5). Most areas experience a decrease between pixel level and the lowest aggregation level. The center and the suburban area are exceptions to this: accuracy increases by 2.0% and 2.6%, respectively, for the two smallest segment sizes. The accuracy for suburban areas remains about the value for pixel level at all segment levels, whereas the accuracy for the center decreases with a relative maximum at segment size 13.1. For the other four inner urban areas, the overall accuracy shows varying irregular patterns of decrease with one or two relative maxima, which are usually below those of the pixel level. The relative maxima of classification accuracy for the various urban structure types occur at different segment sizes.

For matters of comparison, a second segment-based approach was performed. In this approach segment-based training is performed, i.e. the spectral mean values of those segments that contain the original training pixels are used for the training. This way, an individual SVM is trained for each segmented image and then used for the classification of this data set, as it is usually done in segment-based studies. The overall accuracies achieved by this approach at different aggregation levels are generally 2% or more below the ones of corresponding classifications with SVM trained on pixels. (Results not shown)

In addition, it was tested to integrate segment features like area or texture measures into a segment-based approach. Again, results were worse and more irregular than those presented. This might be explained by the low accuracy of spectral classification at segment sizes that lead to meaningful values for segment-specific features.

Chapter III:4.3 Multi-level classification of fused data

↓63

For the multi-level approach three sets of OAA rule images were combined into one set of mean values (pixel level; average segment sizes 3.4 and 13.1). The visual assessment shows that many positive effects of classifications at varying segment sizes are combined in this fused classification result (Fig. III-3, bottom). For example, some detail of the paths around the gym is preserved while the shape of the building’s roof is well represented (Fig. III-3.a, bottom); individual trees are not generalized in the map, while the building complexes appear as homogenous areas with relatively accurate outlines (Fig. III-3.d, bottom). The case of a metal roof is especially interesting (Fig. III-3.e): several roof pixels are classified as water at pixel level, caused by the similarity of this rare roof material to specular reflecting water pixels in the training data. Entire patches of the roof are misclassified in the segment-based approach, whereas the fused classification achieves the overall best results.

The statistical assessment shows that results are not a simple average of the three individual maps derived from the comprised rule images (Table III-5). The overall accuracy is slightly, but not significantly higher than in the single layer segment-based approaches at 87.8% (κ = 0.82) and lower than the pixel-based results. The confusion matrix is similar to that of the pixel-based approach and overall accuracies for different urban structure types are similar to and in the case of the center and suburban areas better than those achieved by the pixel-based approach (Table III-5). So are the producer’s and user’s accuracies of the individual classes. (Results not shown)

Chapter III:5 Discussion

Chapter III:5.1 Performance of the pixel-based classification

At 88.7% the overall accuracy of the SVM classification of the urban HyMap data is high. To a certain extent, the accuracy owes to the high abundance of vegetated surfaces in Berlin. Still, the overall producer’s accuracies of the critical classes impervious, pervious, and built-up are well balanced and all close to 75% or clearly above. This underlines the capability of SVM to differentiate spectrally similar, multi-modal classes. The assessment of dark areas, i.e. shaded impervious and vegetation pixels plus water, exhibits an overall accuracy of 89.2% based on 282 corresponding reference pixels. These high accuracies show that SVM can be recommended for mapping complex classes in urban areas. The classification problem can be tackled in a one-step approach, where thematic and spectral definitions are identical and accurate maps are produced in a simple yet more intuitive and time saving manner.

↓64

Despite this high accuracy, some confusion remains that is partially caused by the spectral similarity of materials from different land cover classes. Sometimes, complete roofs or impervious surfaces are misclassified. This underlines the limits of purely spectral classification and the ambiguity of urban surfaces even in hyperspectral data (Herold et al., 2003). The accuracy of the class pervious differs between low values for open soils on construction sites in the city and high accuracies for natural soils at the urban fringe. The distinction between non-vegetated pervious and impervious areas within the city appears critical even with hyperspectral data. This generally questions the suitability of approaches like the vegetation-impervious surface-soil model (Ridd, 1995).

Besides the spectral ambiguity of materials, the occurrence of mixed pixels can be named as a source of error. In this context, the assessment of individual reference pixels shows problems for pixels from built-up, impervious, or pervious areas with little abundance of vegetation. They are often confused because the influence of the vegetation signal obscures the slight spectral changes among the three non-vegetated surfaces (Fig. III-3.a,c).

Considering the high overall accuracy and the remaining patterns of confusion, the strategy of performing a clustered sampling of the training data performs very well and significantly decreases the time needed for sample collection. A further assessment of ideal sampling approaches for urban areas is worthwhile but goes beyond the scope of this work.

↓65

The classification accuracy in the different urban structure types reflects the discussed causes of confusion against the background of spatial patterns: vegetation shows high accuracies for all areas, but in dense residential areas with many small vegetated patches, i.e. a high number of mixed vegetation pixels, the producer accuracy decreases to 80.8%. Due to the same phenomenon, values are low for built-up in single-house residential areas. The low accuracy of this class in industrial areas can often be explained by spectral similarity of tar roofs to impervious surfaces or by the manifold painted or coated roof materials that are not all represented in the training data. In the city center, dense residential areas and in areas with apartment complexes, materials are less diverse and the accuracy for built-up is hence higher. The relatively low accuracy of the class impervious in the city center owes in parts to the high number of cars on the wide boulevards, which are spectrally more similar to roof materials.

Chapter III:5.2 Effect of image segmentation on spectral classification

Maps based on segmented data (Fig. III-3) are more homogeneous than the pixel-based classification and thus better suited for many subsequent analyses. The spectral generalization definitely removes some disturbing effects, e.g. cars on impervious surfaces. While positive and negative influences can be observed simultaneously, no single segment level can be identified where positive effects seem to clearly dominate for all urban structure types. However, negative impacts become more frequent at average segment sizes greater 13.1. This is in accordance with Bruzzone and Carlin (2006), who also discover a shift towards negative effects when segments become larger than natural objects.

The size of natural objects and the contrast to surrounding areas turns out to be of great influence with increasing segment sizes. Natural object outlines correspond to the outlines of large segments in the case of great contrast between an object and its surrounding pixels (Fig. III-3.a,d). Within these large segments, mixed pixels along the edges are of little spectral influence and maps are more accurate than at small segment sizes. In densely populated residential areas, the thematic boundaries between buildings, roads and vegetated areas are often obscured by non-thematic phenomena like shadow, visible facades and illumination differences on roofs (Fig. III-3.b). In the case of shadowed areas, for example, dark vegetated and non-vegetated pixels exist next to another. Segment outlines heavily depend on the overall illumination and rather follow the outline of the shadowed area. In a similar way, large segments might comprise bright facades, adjacent sidewalks, and sometimes parts of the well illuminated southern side of a roof or they include dark roofs and adjacent dark shadowed areas (Fig. III-3.b). A correct assignment of such two-class segments is not possible and phenomena like shadow pose severe problems to segment-based approaches. Segment-based results are thus very sensitive to slight changes to the setup and difference of large areal extent might occur from one segmentation level to the next (Fig. III-3.b).

↓66

In the same way as spectral generalization of very small natural objects is positive concerning cars on roads, it can be negative in the case of individual trees (Fig. III-3.d) or along linear objects represented by mixed pixels (Fig. III-3.a). In general, the large number of mixed pixels impacts results twice: at first, areas with many mixed pixels are always hard to classify. At second, segment outlines might be arbitrarily placed in such regions and as a consequence all pixels in the corresponding segments have a higher chance of misclassification.

The quality of classifications at increasing aggregation levels differs between the urban structure types, due to the differences of spatial structures and material composition. As expected, this can to some extent be related to dominating structures: the rather small structured residential areas should be classified at lowest aggregation levels (Table III-5). The city center and suburban areas perform well at medium levels, due to larger spatial structures. Accuracy for apartment complexes suffers from the shadow and facade phenomenon. The high accuracy for single house residential structures can be explained by the high fraction of vegetation. Results underline that no ideal average segment size exists for the work in heterogeneous urban environments. This drawback might be tackled by using additional information on the outlines of natural objects from external knowledge bases like cadastres or GIS layers.

Compared to visual impression from the segment-based maps, the decrease in statistical accuracy from pixel to segment level in inner-urban areas is not surprising. Increased homogeneity can not be considered more accurate, but rather more favorable for human perception. Two main reasons can be named for decreasing accuracy: (1) the spatial composition of urban areas is very heterogeneous, even within a single urban structure type (e.g. Fig. III-3.d), and the manifold natural objects can hardly be represented in one segmentation level; (2) the level of detail aimed for in the present classification, i.e. the distinction between buildings and other impervious areas, is at the limit of the 4 m spatial resolution of HyMap. This is underlined by the fact that only the two very small segment sizes of 2.4 and 3.4 pixels lead to results not significantly below those in the pixel-based approach. To further test local heterogeneity, a 3x3 majority filter was applied to pixel-based results. Unlike in many other studies, the filtering leads to significantly lower results at 85.8% (Z = 3.37). This underlines the heterogeneity of the area, but at the same time shows that the adaptive region-growing during segmentation is superior to generalization with fixed window sizes.

↓67

The development of segment-based classification accuracy for different urban structure types with increasing segment sizes suggests the ratio between the average size of natural objects and the area represented by one pixel to be of great importance. It can thus be expected that a similar analysis on higher resolved data leads to a better description of the characteristic spatial scale and results with presumably higher accuracies for low or medium aggregation levels. However, there is no hyperspectral data with HyMap’s spectral properties and better spatial resolution available.

Chapter III:5.3 Performance of the multi-level classification

Despite the slightly, i.e. not significant, lower overall accuracy of the multi-level classification, the corresponding map is preferred over the single level approaches. It combines positive effects of different segmentation levels into one single result and this way achieves good results for varying urban structure types.

Although information from three levels is fused by simply averaging the rule images, the result appears superior to a simple average: the overall accuracy is above an average value of the three original ones and positive effects seem to outweigh negative influences. This can to some extent be explained by the maximum value decision in the OAA approach: (1) in the case of mixed pixels or pixels from ambiguous materials, two or more classes might show positive values for their corresponding OAA decision value. At different aggregation levels, this pixel is then merged with different groups of neighboring pixels and hence information on the local situation at different scales included into the classification (Bruzzone and Carlin, 2006). The average of the three decision values for this pixel in the rule images is expected to be more representative. The successful concept of classifier ensembles relies on a similar assumption (Kittler, 1998). (2) If, for example, a single tree achieves a very high positive decision value for the class vegetation at pixel level, plus lower positive values for vegetation and a second class at segment levels, the class vegetation will win in the combined approach, although the tree might not be recognized in all segment-based classifications.

↓68

The multi-level approach is straightforward and fast to perform: only one SVM classifier is trained on pixel information and applied to unsegmented data and all segmented images. Results might be improved by giving different weights to classes at different levels or by incorporating hierarchical information. Such optimized multi-level approaches are often suggested in literature to deal with complex environments (Damm et al., 2005;Schöpfer and Moeller, 2006). However, the development of transferable approaches that include information from different levels of aggregation is a time-consuming task (Schöpfer and Moeller, 2006). Especially for very large heterogeneous data sets that comprise very different urban structure types as data in this work, more complex multi-level approaches appear not feasible.

In Bruzzone and Carlin (2006) two subsets of pan-sharpened Quickbird data from an urban area are used to test a similar multi-level SVM classification approach. They combine the spectral information at pixel level with spectral mean values and variances from segmented data sets and perform training and classification on this fused data set. In the present work, it was also tested to train SVM on combinations of several segmentation levels, with and without spectral variance values or texture measures. Results did not improve compared to pixel level and again training based on the spectral information from spectrally generalized segmented data appears not useful, as in the second segment-based approach (compare Section 4.2). This might be explained by the lower spatial resolution of HyMap compared to Quickbird.

Thus, the simple multi-level approach taken here with a pixel-based training and classification at multiple segmentation levels appears useful for the classification of heterogeneous data. It can be expected to be transferable to other spectral classification problems. For the work with very high spatial resolution data, the number of mixed pixels and hence their negative influence is significantly lower. Results are then expected to be more positively influenced by image segmentation, as shown in Bruzzone and Carlin (2006) and Shackelford and Davis (2003).

Chapter III:6 Conclusions

↓69

High overall accuracies are achieved using a purely spectral SVM classification approach on hyperspectral data from an urban area. SVM delineate broad thematic classes without the previous definition of spectrally homogeneous sub-classes or separate treatment of dark areas. This way, it was proven that SVM are capable of describing complex class distributions.

This study advances the understanding of segment-based image processing in heterogeneous environments by performing a segment-based approach with a narrow focus on spectral properties and in direct comparison to pixel-based results. The influence of the spectral generalization on the purely spectral, supervised classification was investigated. Different effects were identified with regard to average segment sizes. The findings can be a guideline in future remote sensing analyses of urban areas.

Results from the present work suggest that spectral information from data at this spatial resolution should best be included into segment-based analyses at low aggregation levels or at pixel level. More important, the training of a spectral classifier should be performed on original pixel values. When segment-specific features are used in the classification, a combined pixel- and object-based approach similar to the one performed in Shackelford and Davis (2003) appears useful, although no positive influence of segment specific features could be identified for the present data set.

↓70

The multi-level approach applied in this work can be recommended for its ability to incorporate positive effects of segment-based analyses at various levels into one single map. The quality of segment-based or multi-level approaches might be enhanced by incorporating more segment features and multi-level hierarchies. However, designing such multi-level segmentations and corresponding classifiers is time consuming and they are harder to be transferred to new environments. The simplicity and fast implementation are additional assets of the approach taken in this work.

Acknowledgments

The authors are thankful to A. Damm, P. Griffiths and M. Langhans (HU Berlin) for performing most of the atmospheric preprocessing. S. van der Linden was funded by the scholarship programme of the German Federal Environmental Foundation (DBU). This research was partly funded by the German Research Foundation (DFG) under project no. HO 2568/2-1. The two anonymous reviewers provided valuable input on the experimental setup and the manuscript.


© Die inhaltliche Zusammenstellung und Aufmachung dieser Publikation sowie die elektronische Verarbeitung sind urheberrechtlich geschützt. Jede Verwertung, die nicht ausdrücklich vom Urheberrechtsgesetz zugelassen ist, bedarf der vorherigen Zustimmung. Das gilt insbesondere für die Vervielfältigung, die Bearbeitung und Einspeicherung und Verarbeitung in elektronische Systeme.
DiML DTD Version 4.0Zertifizierter Dokumentenserver
der Humboldt-Universität zu Berlin
HTML generated:
27.05.2008