fe_notes

39
 Information Extraction from Remotely Sensed Images From Data to Information Data refers to numerical results of any set of measurements regardless of whether or not the measurements are acquired with a certain purpose in mind. Information is an aggregate of facts so organized or datum so utilized as to be knowledge or intelligence. Data has to be transformed so as to derive Information. The process of transforming data into information is known as “Information Extraction”. Information Extraction can be of three types: 1. Manual 2. Semiautomatic 3. Automatic Information is the Key  Exploring the image has almost superseded the image itself . The information derived from imagery is what provides earth resource managers with the information they need to make decisions – the best place for a new dam, the height of flood defenses and so on. Remote Sensing is all about information – the tangible result of all the effort that goes into building and operating an earth observation platform is a set of measurements of the Earth system from space, from which we can derive information of economic, social, strategic, political or environmental value. As sensors are continually being developed and refined so image-processing tools have to change to ensure that new data can be fully exploited. The last 26 years has been characterized by steadily increasing spatial accuracy and the increase of microwave SAR  – b ut what affect will it have on how we process it? Many more sensors are p lanned for the next few years, all of which will inevitably require new processing tools. By far the most consistent trend has been the improvement in spatial resolution of optical images since the 80 meter Multi Spectral Scanner on board Landsat 1. These will be as sharp as 50 centimeters in the case of Quickbird. Whilst higher resolution enables greater identification of small objects, it causes traditional land classification techniques to  become unreliable because the contributions of different material types within the pixel distort the pixel spectrum from that of the material of interest, often resulting in a loss of discrimination and potential misclassification. Other technique for addressing these have  been tried in the past, such as neural networks, but with limited success and reliability and is hence used very little commercially. Where are we heading for………

Upload: manohar-kumar

Post on 04-Nov-2015

216 views

Category:

Documents


0 download

DESCRIPTION

Information Extraction from Remotely Sensed Images

TRANSCRIPT

  • Information Extraction from Remotely Sensed Images From Data to Information

    Data refers to numerical results of any set of measurements regardless of whether or not the measurements are acquired with a certain purpose in mind. Information is an aggregate of facts so organized or datum so utilized as to be knowledge or intelligence. Data has to be transformed so as to derive Information. The process of transforming data into information is known as Information Extraction. Information Extraction can be of three types:

    1. Manual 2. Semiautomatic 3. Automatic

    Information is the Key Exploring the image has almost superseded the image itself. The information derived from imagery is what provides earth resource managers with the information they need to make decisions the best place for a new dam, the height of flood defenses and so on. Remote Sensing is all about information the tangible result of all the effort that goes into building and operating an earth observation platform is a set of measurements of the Earth system from space, from which we can derive information of economic, social, strategic, political or environmental value. As sensors are continually being developed and refined so image-processing tools have to change to ensure that new data can be fully exploited. The last 26 years has been characterized by steadily increasing spatial accuracy and the increase of microwave SAR but what affect will it have on how we process it? Many more sensors are planned for the next few years, all of which will inevitably require new processing tools. By far the most consistent trend has been the improvement in spatial resolution of optical images since the 80 meter Multi Spectral Scanner on board Landsat 1. These will be as sharp as 50 centimeters in the case of Quickbird. Whilst higher resolution enables greater identification of small objects, it causes traditional land classification techniques to become unreliable because the contributions of different material types within the pixel distort the pixel spectrum from that of the material of interest, often resulting in a loss of discrimination and potential misclassification. Other technique for addressing these have been tried in the past, such as neural networks, but with limited success and reliability and is hence used very little commercially.

    Where are we heading for

  • The era of 1-meter satellite imagery presents new and exciting opportunities for users of spatial data. With Space Imagings IKONOS satellite already in orbit and satellites from EarthWatch Inc., Orbital Imaging Corp. and, of course, ISRO scheduled for launch in the near future, high resolution imagery will add an entirely new level of geographic knowledge and detail to the intelligent maps that we create from imagery. Geographic imagery is now widely used in GIS applications worldwide. Decisions made using these GIS systems by national, regional and local governments, as well as commercial companies, affect millions of people, so it is critical that the information in the GIS is up to date. In most instances, what aerial or satellite imagery provides is the most up to date source of data available, helping to ensure accurate and reliable decisions. However, with technological advancements come new opportunities and challenges. The challenge now facing the geotechnology industry is two fold - how best to fully exploit high-resolution imagery and how to get access to it in a timely manner. Is high-resolution imagery making a difference? There is no doubt that the GIS press has been deluged with high-resolution imagery for the last few years. Showing an application with an imagery backdrop provides an immediate visual cue for readers. Without the imagery backdrop, the context is lost and the basic map, comprising polygons, lines and points becomes more difficult for the layman to interpret. It is the context or visual clues that provide the useful information and it is this information that is the inherent value of the imagery. The higher the resolution of the imagery, the more man made objects that can be identified. The human eye the best image processor of all can quickly detect and identify these objects. If the application is therefore one that just requires an operator to identify objects and manually add them into the GIS database, then the imagery is making a positive difference. It is adding a new data source for the GIS Manager to use. However, if the imagery requires information to be extracted from it in an automated and semi automated fashion (for example, a land cover classification), it is a different matter. If the same techniques that were developed for earlier lower resolution satellite imagery are used on the high-resolution imagery, (such as maximum likelihood classification), the results can actually create a negative impact. Whilst lower resolution imagery isnt affected greatly by artifacts such as shadows, high-resolution data can be. Lower resolution data also smoothes out variations across ranges of individual pixels, allowing statistical processing to create effective land cover maps. Higher resolution data doesnt do this individual pixels can represent individual objects like manhole covers, puddles and bushes - and contiguous pixels in an image can vary dramatically, creating very mixed or confused classification results. There is also the issue of linear feature extraction. Lines of communication on a lower resolution image (such as roads) can be identified and extracted as a single line. However, on a high-resolution image, a road comprises the road markings, the road itself, the kerb (and its shadow) and the pavement (or sidewalk). A very different method of feature extraction is therefore needed.

  • Its not just the spatial resolution that can affect the usage of the imagery. With 11 bit imagery becoming available, the ability of the GIS to work with high spectral content imagery becomes key. 11 bit data means that up to 2048 levels of grey can be stored and viewed. If the software being used to view the imagery assumes it is 8 bit (256 levels), then it will either a) display only the information below the 255 level (creating either a black or very poor image) or b) try to compress the 2048 levels into 256, also reducing the quality of the displayed image considerably. Having 2048 levels allows more information in shadowy areas to be extracted as well as enabling more precise spectral signatures to be defined to aid in feature identification. However, without the correct software, this added bonus can easily turn into a problem. Information Extraction from Remotely Sensed Images:

    Geoinformation extraction using image data involves the consuction of explicit,meaningful descriptions of physical objects (Ballard & Brown, 1982).When performing analysis of complex data one of the major problems stems from the number of variables involved. Analysis with a large number of variables generally requires a large amount of memory and computation power or a classification algorithm which overfits the training sample and generalizes poorly to new samples. Feature extraction is a general term for methods of constructing combinations of the variables to simplify these problems while still describing the data with sufficient accuracy. Best results are achieved when an expert constructs a set of application-dependent features. All approaches usually include object recognition i.e. interpretation using eye-brain/computer system and object reconstruction i.e. coding, digitizing,sructuring.

    It can be used in the area of image processing which involves using algorithms to detect and isolate various desired portions or shapes (features) of a digitized image or video stream. Generally approaches for information extraction using image processing technoques may be grouped as follows:

    Low-level

    Edge detection Corner detection Blob detection Ridge detection Scale-invariant feature transform

    Curvature

    Edge direction, changing intensity, autocorrelation.

    Image motion

    Motion detection.

  • Shape Based

    Thresholding Blob extraction Template matching Hough transform (Lines, Circles/Ellipse, Arbitrary shapes -Generalized Hough

    Transform)

    Flexible methods

    Deformable, parameterized shapes Active contours (snakes)

    Given below in scompiled form is various terminology used in context of geoinformation extraction particlarly from image data:

    Scene Part of the visible world that one would like to describe

  • INFORMATION EXTRACTION FROM REMOTELY SENSED IMAGES

    Data acquisition and data updating are important aspects in developing and maintaining Geographical Information Systems (GISs). The spatial data in most existing GISs are derived from existing maps through digitization. This method is prone to errors, and the accuracy of the data derived from the existing maps is relatively low, especially temporally. Photogrammetric measurement is another important method for data acquisition. The data produced using this method can have good spatial accuracy. However, the method is relatively expensive, as it needs precision photogrammetric instruments and well-trained professionals. Therefore, methods of obtaining spatial data for GISs efficiently and precisely have become a focus of photogrammetric research. Feature Extraction The automatic extraction of information from aerial photographs and satellite images is a major requirement of the new digital based technology in photogrammetry. While a number of tasks such as DEM and orthophoto determination can be achieved with a large degree of automation, the extraction of linear and other features must still be undertaken manually. Research described below aims to develop methods of incorporating a greater level of automation in these tasks. Semi-automatic Feature Extraction The semi-automatic method for the extraction of linear features on remotely sensed images in 2D and 3D, is based on active contour models or 'snakes'. Snakes are a method of interpolation by regular curves to represent linear features on images. The initial feature extraction is achieved by image processing operators, such as the Canny operator for single edges, and morphological tools for narrow linear features of 1 or 2 pixels in width. The approach developed is semi-automatic, and hence is assisted by an operator to locate a selection of points along and near, but not necessarily exactly on the feature. The iterative computation then locates the feature as closely as the details in the image will allow by an optimisation process, based on the definition of the snakes by cubic B-splines. The features are extracted on single images by 2D snakes in terms of the local image coordinates, or in 3 dimensions using overlapping images in terms of 3D object coordinates. Tests of the method applied to aerial photography and SPOT satellite images have been carried out in terms of the accuracy of the extracted features and pull-in range, for a range of features in 2 dimensions, and in 3 dimensions in terms of their object coordinates derived from photogrammetric measurements and from maps. Automatic Feature Extraction In contrast to semi-automatic methods, automatic road extraction aims at locating a road in images without input from an operator on its initial position. Locating a road in images auto-matically has two tasks, ie, recognition of a road and determination of its position. Recog-nizing a road in an image is much more difficult than determining its

  • position as it requires not only the information which can be derived from the image, but also a priori knowledge about the properties of a road and its relationships with other features in the image and other related knowledge such as knowledge on the imaging system. Due to the complexity of aerial images and existence of image noise and disturbances, the information derived from the image is always incomplete and ambiguous. This makes the recognition process more complex. A knowledge-based method for automatic road extraction from aerial images has been developed in this laboratory. The method includes bottom-up hypothesis generation of road segments and top-down verification of hypothesized road segments. The generation of hypotheses starts with low-level processing in which linear features are detected, tracked and linked. The results of this step are numerous edge segments. They are then grouped to form the structure of road segments based on the general knowledge of a road, and the generated structures of road segments are represented symbolically in terms of geometric and radio-metric attributes. Finally, applying the knowledge stored in the knowledge base to the generated road structures hypothesizes road segments. As hypotheses of road segments are generated in a local context, ambiguity is unavoidable. To remove spurious hypothesized road segments, all hypotheses are checked in a global context using the topological information of road networks, which is derived from low-resolution images. The missing road segments are predicted using topological information of road networks. This method has been applied to a number of aerial images with encouraging results.

  • EXTRACTION OF POINTS General Principles for Point Extraction Definition : Points are image objects, whose geometric properties can be represented by only two coordinates (x, y). One can distinguish between several types of points. A circular symmetric point is a local heterogeneity in the interior of a homogeneous image region. CSPs are too small to be extracted as regions (depending on the image scale) and are characterised by properties of circular symmetry (e.g., peaks, geodetic control point signals, man-holes). CPSs can be interpreted as region attributes; they do not affect the image structure. Endpoints (start point or end point of a line), corners (intersection of two lines) and junctions (intersections of more than two lines) are used for the geometrical description of edges and region boundaries. Missing of these points can cause fatal consequences for the symbolic image description.

    REPRESENTATION: The symbolic description of points can be given as a list containing geometric attributes (the coordinates), radiometric attributes (e.g., strength) and relational attributes (e.g., the edges, intersecting at this point). APPLICATIONS Major applications for extracted image points are image-matching operations. Assuming that extracted points refer to significant points in the real world, we can look for the same real point in two images taken from a different view. This technique is used for image orientation (PADERS et al. 1984) or DTM-generation (e.g., KRZYSTEK 1991).

  • BASIC APPROACHES Here we only review approaches that solely use the image data (one could also think of point extraction methods, which determine junctions or intersections from already extracted contours). Three prominent methods are: Point template matching Corner detection based on properties of differential geometry Point detection by local optimization Deriving the point coordinates, normally follows a three-step procedure: in the first step point regions are selected, applying a threshold procedure. These are image regions where points are supposed to lie inside. In a subsequent step the best point pixels within these regions are selected this operation could be referred to as thinning. An even more accurate determination of the point position can be derived by a least squares estimation (LSE), so in this step we look for the real valued coordinates of points. Point Templates One possibility to detect point regions is to define a point pattern (template), which represents the point structure we are looking for. The main idea of template matching is to find the places in the image where the template fits best in the image. The similarity between the template and the image can be evaluated by multiplication of the template values with the underlying image intensities or by the estimation of the correlation coefficients. Disadvantages of template matching in general are the limitation by the number and types of templates, and sensitivity to changes in scale and to image rotation (assuming that the template are rotational invariant). Corner Detection by Curvature Let us assume that the image data is stored in an image function g(r,c), r refers to the row of the image, c to the column. Several approaches are based on the curvature of g, which can be expressed by the second partial derivatives in the coordinates axes r and c. The sign of the curvature can be used for the classification of the pixels and for the detection of corners. An overview and evaluation of these approaches can be found in (DERICHE AND GIRAUDON 1990). Point Detection by Optimization: MORAVEC (1977) was the first who proposed an approach aiming at detecting points, which can be easily identified and matched in stereo pairs. He suggested measuring the suitability or interest of an image point by the estimation of the

  • variances in a small window (4x4, 8x8 pixels). This method is used in many stereo matching algorithms and initiated further investigations leading to the interest operators proposed by PADERES et. Al (1984) and FORSTNER AND GULCH (1987). Similar to the Moravec-Operator, the objective of these operators is the detection of adequate points (but with higher accuracy). Adequate points are those which meet the two criteria of (1) local distinctness (to increase geometric precision) and (2) global uniqueness (to decrease search complexity), in figure the Forester-Operator is able to detect different point types with the same algorithm and can be used either for image matching or image analysis approaches.

  • Interest-operator in a 1-D case: Image matching can be reduced to a one-dimensional problem, using the epipolar geometry of two images. In this case the aim is to match two intensity profiles. The effect of the interest operator in 1-D is identical to finding the zero crossings of the Laplacian, neglecting saddle points of the intensity function.

  • EXTRACTION OF EDGES

    General Principles for Edge Extraction : DEFINITION Referring to BALLARD AND BROWN 1983, ROSENFELD AND KAK 1982, NALWA 1993 and edge is an image contour, where a certain property like brightness, depth, color or texture (see Fig.11a) changes abruptly perpendicular to the edge. Moreover, we assume that on each side of the edge the adjacent regions are homogeneous in this property. According to these characteristics, edges can be classified into two general types, step edges (edges) and bar edges (lines)

    Edges represent boundaries between two regions. The regions have two distinct (and approximately constant) pixel values; e.g., in an aerial image two adjacent agricultural fields with different land use. Lines either occur at a discontinuity in the orientation of surfaces, or they are thin, elongated objects like streets in a small-scale image. The latter may appear dark on bright background or vice versa. When the scale is large the street appears as an elongated 2-D region with edges on both sides. To avoid conflicts in the symbolic image description it might be necessary to make an explicit distinction between edges and lines. REPRESENTATION Edges extraction usually leads to an incomplete description of the image, i.e. edges do not build closed boundaries of homogeneous image regions. The types of representation of single edges are manifold depending on the intended use. The symbolic description of edges can be given, e.g. as a list, containing geometric, radiometric (e.g. strength, contrast) and relational attributes (e.g. adjacent regions, junctions, etc.). The geometric attributes depend on the choice

  • of the approximation function (see step 5 below). For linear edges it is sufficient to specify the start and endpoint. APPLICATIONS Contrary to points as image features, one can argue that a list of all edges in an image contains all the desired image information, but its representation is much more reduced and is easier to be interpreted by a computer. To support this statement, consider again the image in Figure 3a. Just by looking at the edges it is possible to recognize the object. If in addition each had stored the brightness of its left and right adjacent region, the information would be even more complete. Another justification could be based on information theory, COVER AND THOMAS (1991) wrote: the less a certain structure can be found in an image, the more unexpected it is. This means that an unexpected structure contains much more information than a frequent one (like homogeneous regions). Edges can be used to solve a broad range of problems because their importance, some of them are: Relative orientation: Edge-based matching in stereo pairs are applied for relative orientation, e.g. L1 and SCHENK (1991) use curved edges. Absolute orientation: Matching edges with wire frame models of buildings can be used for absolute orientation. Object recognition and reconstruction: In many cases object models consist of structural descriptions of object parts. Straight lines often bound parts of man-made objects. The structural description based on edge extraction provides besides its completeness highest geometrical accuracy. Models about the expected shape of object boundaries can be involved easily in the process, e.g. searching for straight lines. Therefore, extracting edges is widely used for object recognition. BASIC APPROACHES Both edges types can be detected by the discontinuity in the image domain and in the following we will make no distinction between these types as long as it makes no difference for the algorithm. Since the beginning of digital image processing, edge detection has been an important and very active research area. As a result, a lot of edge detectors have been developed, which differ in the image or edge model they are based on, the complexity, the flexibility and the performance. In particular, the performance depends on 1) the quality of detection, i.e. the probability of missing edges and yielding spurious edges and 2) the accuracy of the edge location. Unfortunately both criteria are conflicting.

  • Even a short description of all approaches is beyond the scope here, so we only outline the principles by looking at the main processing steps most edge detector algorithms have in common. A typical approach consists of five steps: Extraction of edge regions: Extraction of all pixels, which probably belong to an edge. The result is elongated edge regions. Extraction of edge pixels: Extraction of the most probable edge pixels within the edge regions reducing the regions to one pixel wide edge pixel chains. Extraction of edge elements (edgels): Estimating edge pixel attributes, e.g. real valued position of the edge pixels, accuracy, strength, orientation, etc. Extraction of streaks: Aggregation or grouping of the edgels that belong to the same edge. Extraction of edges: Approximation of the streaks by a set of analytic functions, for example polygons. In the following section the main objectives and the most common techniques of each step will be mentioned. Edge Regions The aim of this step is to extract all pixels from an input image, which are likely to be edge pixels. The extraction could be done by template matching, by parametric edge models or by gradients. Starting from an image with the intensity function g, the result is a binary image where all edge pixels are labeled. In addition, iconic features, e.g. the edge magnitude and the edge direction, of each edge pixel are extracted and stored as they are required in subsequent steps. Template Matching: Edge templates are patterns, which represent certain edge shapes. For each edge type (different edge models, different edge directions, edge widths and strengths) a special pattern is required. Operators can be found e.g. in ROSENFELD AND KAK (1982). Gradient Operators (Difference Operators): The main idea of these approaches is that in terms of differential geometry the derivatives of an image intensity function g can be used to detect edges, which is more general than template matching procedures. The first step is to apply linear filters (convolution) to obtain difference (slope) images. The slope images represent the components of the gradient of g; from these the edge direction and edge strength (magnitude) can be calculated for each pixel.

  • The convolution of the image with one of the many known difference operators is followed by a threshold procedure for distinguishing between the heterogeneous image areas, i.e. pixels with high gradients and the homogeneous area, i.e. pixels with low gradients (see Sec. 2.3). All pixels above a certain threshold are edge region pixels. Parametric Edge Models: An example for a parametric solution of edge detection is Haralicks Facet Model (HARLICK AND WATSON 1981), which can be used either for edge detection or for extracting regions and points. The idea is to fit local parts of the image surface g by a first order polynomial f (sloped planes or facets). Three parameter , and represent the facet f, which can be evaluated by least squares estimation. The model is given by g (r, c) = r + c + + n (r,c), where and , are the slopes in the two coordinate axes r and c, the altitude of the facet and n(r,c) the image noise. HARALICK AND SHAPIRO 1992 showed that the result of this approach is identical to the convolution with a difference operator. The classification of edge pixels is a function of the estimated slopes (,) : if the slopes are greater than a given threshold and, in addition, the variances are small enough (to avoid noisy image areas, which are assumed to be horizontal), the pixel belongs to an edge region.

  • Edge Pixels Due to low contrast, image noise, image smoothing, etc. the first step leads to edge regions, which are possibly more than one pixel wide. The aim of this step is, to thin the edge regions to one pixel wide edge chains. These pixels should represent the real edges with highest probability. Assuming the real edge is located in the mid-line (skeleton) of the edge regions, thinning or skeleton algorithms can be applied. Obviously these midlines of edge areas are not necessarily identical to the real edges. To improve the accuracy of edge location, the properties of the pixel like the gradient or the Laplacian may be used for extracting the most probable location of the edges. This can be done by the analysis of the local neighbourhood of each pixel (non-maxima-suppression) or by global techniques (relaxation, Hough transformation). The non-maxima-suppression is the most widely used method.

    Non-Maxima-Suppression: The process consists of two steps: 1) The selection of the neighbour pixels in the gradients direction, which have to be used for the comparison; 2) Suppression of pixels, which are found to have lower gradient magnitudes than their neighbours. An example for (1) is given by CANNY (1983) : his algorithm is defined in a N8-Neighbourhood (see Fig. 7 and Fig. 7a). Given an edge pixel (r,c) and its gradient direction g perpendicular to the edge e, the first step is the estimation of the two points P1 and P2. The gradient magnitudes for P1 and P2 can be approximated by a simple linear interpolation of the gradient magnitude of the two adjacent pixels. The location of the edges also can be determined analyzing the zero-crossings of the Laplacian. One problem is, however, that zero crossings occur both at the extreme of the gradient function and at saddle points of g. The saddle points should be neglected.

  • After the selection of the edge pixels by non-maxima-suppression, the edge areas are in most cases reduced to thin lines. Due to the discrete image raster and image noise, edge regions might occur which are still more than one pixel wide. In this case subsequent thinning is required.

    Edge Elements The extraction of edgels is the first transition stage from the edge pixels in the discrete image domain to the symbolic description of the edge. This step contains the estimation of properties of the edge pixels required for subsequent interpretation processes (e.g. real values coordinates, contrast, sharpness, strength, type) and which are stored as attributes of the symbolic edge elements.

  • Edge Streaks The next step is to group all edgels, which belong to the same edge. One can say that now the real detection of the image feature edge happensbut the real edge is represented as a list of edgesl. The aggregation of the edge elements can be done using local (edge tracking) or global techniques (Hough transformation, dynamic programming, heuristic search algorithm). The grouping process should ensure that each streak 1) consists of connected edgels, where each pixel pair is connected by a non-ambiguous pixel path and 2) delineates at most regions (usually edges delineate two regions, except dean lines or open edges, which are surrounded by the same region). To satisfy the second criterion we define a streak as an edge pixel chain between two edge pixels, which are either end pixel(s) and/or node pixel(s). According to the number of neighbours in a N8-Neighbourhood we classify the pixels as node, line or end pixels as shown in Fig. 9. Given the classification, the easiest aggregation method is an edge following or edge tracking algorithm: first one has to look for an unlabeled edge pixel, which means, that this edge pixel does not yet belong to an edge. If you found one, you track all direct and indirect neighbours until en end-or node pixel appears. All these collected edge pixels belongs to one edge and will be labeled with a unique edge number.

  • Edge Approximation Up to now, the extracted streaks are still defined in the discrete image model as they are represented by a set of connected edge elements. Thus, for deriving a symbolic description of the edges a last processing step is required. This step is very important since the representation domain changes from the discrete image raster to a continuous image model, the plane.

    It is not obvious how to approximate a list of edgels by an analytic representation. For example, you could apply curve-fitting techniques like splines, Bezier curves, Fourier series, etc. This may give you smooth curves and probably better visual

  • results, but it would be too much hassle if you only look for straight lines. Furthermore a polygon as a set of straight lines can also approximate a curved edge. As usual, the choice of the approximation depends on what you want (or the application requires). Here we look at straight-line fitting. Approximation by Straight Lines: For the approximation of the edges by straight lines many different approaches are possible like merging, splitting or split and merge algorithms. The critical point is to find the breakpoints or corners, which lead to the best approximation. The merging algorithm sequentially follows an edge and considers each pixel to belong to a straight line as long as it fits the line. If the current pixel does not fit anymore, the line ends and a new breakpoint is established. A disadvantage of this approach is its dependency on the merging order: starting from the other end of the edge would probably lead to different breakpoints. Splitting algorithms divide recursively the edges in (usually) two parts, until the parts fulfill some fitting conditions. Considering an edge consisting of a sequence of edge pixel P1, P2,..Pn then P1 and Pn being the end points are joined by an arc. For each pixel on that arc, the distance to the edge is calculated. If the maximum distance is larger than a given threshold the edge segment is divided into two new segments at the position where the maximum distance was found. It is possible to combine the advantages of the merging the splitting methods by developing a split and merge algorithm. First we split, and then we do a merging step by grouping lines if the new line fits the streak well enough, see Fig. 10. The accuracy of the symbolic description, i.e. the edge parameters can be improved applying a least square estimation taking all edgels belonging to one edge into account. The observation values are given by the real valued coordinates (xI, yI) of each edgel and the weights are defined by e.g. the squared gradient magnitude. The covariance matrix of the estimated edge parameters contains the accuracy of the edge. Thus, the uncertainty of the discrete image information is preserved in the accuracy of the edges, which could be important for the image interpretation processes.

  • Extraction of Regions

    General Principles for Region Extraction DEFINITION Regions are image areas, which fulfill a certain similarity criterion, we call such regions blobs. A similarity or homogeneity criterion could be intensity value of the image pixel or some texture properties of the surrounded area of the pixel. The result of such a region extraction should divide or segment the image to a number of blobs. Ideally the union of these blobs will give the image again. The regions themselves should be connected and bounded by simple lines. REPRESENTATION Depending on the strategy of the region extraction, we distinguish between different segmentation results. Incomplete segmentation: The image is divided into homogeneous and heterogeneous area first. The latter (we call those areas background) do not fulfill the homogeneity criterion and therefore do not fulfill the above definition exactly. Complete Segmentation: The image is completely divided into regions, fulfilling the definition as given above for the discrete image, too. That might yield to conflicting topology of the image regions, depending on the definition of the neighborhood (N8 or N4) (see PAVLIDIS 1977) but also to inaccurate region boundaries, depending on the cost of the approach. The final symbolic representation of blobs consists of geometric, radiometric and relational attributes. A blob itself can be represented by its boundaries (if the blob contains holes, the blob has more than one boundary) or by a list of pixels inside the blob. Blob boundaries define the location of the blob. Representing blob boundaries is equivalent to representing image edges. Geometric attributes of blobs are size, shape, center of gravity, mean direction, etc.) Algorithms for extracting these attributes can be found in literature, particular in the field of binary image analysis. Radiometric attributes are e.g. mean intensity within the blob, variances of the intensities, texture parameter. Lists of adjacent blobs mutual boundaries, junctions and corners are examples for relational attributes.

  • APPLICATIONS Region information has the advantage that it covers geometrically large parts of the image. Therefore it can be used for several applications like compression or interpretation tasks. Data compression: Grouping all pixels, which are connected in image space and have similar properties to one object (i.e. the blob) and representing the object by characteristics attributes, reduces the amount of data and the redundancy of information. Analysing range images: Region-based segmentation algorithms were found to be more robust when analysing range image. Binary image analysis: In many cases region extraction is a prerequisite for binary image analysis, widely used in industrial applications. High-level image interpretation: in many case object models consist of the structural description of object parts, where the interior of each part is assumed to have similar surface and reflectance properties. Therefore, extracting blobs and their attributes is quite useful for object recognition. BASIC APPROACHES Given a digital image with a discrete image function, region extraction is the process of grouping pixels to regions according to connectivity and similarity (homogeneity). The large amount of region extraction methods can be classified in several ways. One possibility is to separate the methods by the number of pixels, which are used for the grouping decision and are therefore called local or global techniques. Further on we distinguish the methods depending on where the grouping is done: In the first place the grouping process is defined in the image domain. That means, that the decision that connected pixels can be merged or should be split is done directly by the analysis of the properties of adjacent pixels. Thus, both the similarity and the connectivity are considered in one processing step. Examples of this types are: region growing or region merging, region splitting and split and merge algorithms. The second approach applies the similarity and connectivity evaluation in two separate steps: The goal is first to analyse the discriminating properties of the pixels of the entire image and use the result to define several classes of objects. Examples are thresholding and cluster techniques. This is done outside the image raster by storing all pixel properties in a so-called measurement space (e.g. a histogram). Then, the definition of the classes can be used to classify the pixels: Going back to the image domain, each pixel is labeled with the identify number of the class. In the second step, pixels of the same class and which are

  • also connected in the image space are grouped to homogeneous regions. Connected components algorithms can easily do this. In the following a short overview is given on thresholding techniques, region growing/merging and split and merge approaches. An overview on further region-based segmentation techniques can be found in (HARALICK AND SHAPIRO 1985) or (ZUCKER 1976). Thresholding Techniques Thresholding techniques consist of 4 steps (step 1 and 2 are not necessary when the thresholds are known in advance). Determination of the histogram: Choosing the thresholds: The choice of the thresholds is the most sensitive and the most difficult step. Unfortunately, it is not always the case, that the peaks of the histogram (may be more than two) are clearly separated by valleys. Also the histogram often contains many local valleys, which are probably not interesting. A survey on several techniques for estimating thresholds automatically can be found in (HARALICK AND SHAPIRO) 1992. Labeling or classification of the pixels: If the thresholds are determined the pixels can be classified easily. The result of the labeling process can be called segmented image, because the labels are associated with object classes. Extraction of blobs by connected-components: this processing step performs the change from single pixels to blobs. Pixels that are labeled with the same number must be connected by at least one pixel path, which are all labeled by the same value. The connectivity can be defined in a N8- or N4- neighborhood. Connected components algorithms are usually defined in binary images. A description and comparison can be found e.g. in (HARALICK AND SHAPIRO 1992) and (ROSENFELD AND KAK 1982). After this step, every pixel is labeled with a value, where the value is associated with the blob number the pixel belongs to. Thresholding techniques work well and fast if the objects that have to be recognised or analysed are not too complex. This is the case for many industrial applications. The main problem is the automatic estimation of the thresholds. Even when the peaks are well separated, the threshold result may not lead to the accurate regions. Moreover, they may produce holes and ragged boundaries due to the similarity grouping is performed in the measurement space and not in the image domain. In this sense, threshold techniques may not fulfill the criteria of a good region extraction method.

  • Region Growing / Region Merging As the name indicates, region growing and region merging methods follow a bottom-up approach: starting from a single pixel or a small region (the seed or the seed region) the region extraction is done by appending all adjacent pixels to the expanding region which perform a certain similarity criterion. If the image consists of more than one region (that is normally the case) for each of them a separate region growing process is required, which can be done sequentially or in parallel. The process consists of the following steps: Determination of the seeds: The determination of the seeds must ensure that every region, which has to be extracted, contains at least one starting point. In case the number and positions of the seeds are known in advance, region growing can be applied. If the seeds are not given, they may be defined by each pixel of the image raster. In this case a region merging procedure is required. However, the subsequent region-growing step produces probably many small adjacent regions, which are not significantly different from each other. So more processing steps are required to merge as many regions as possible, if they are considered to be similar enough. Region growing starts at the seeds and stops if all pixels are labeled. Referring to HARALICK AND SHAPIRO 1985 the region growing techniques can be distinguished by the number of pixels, which are involved in the grouping decision, i.e. in the evaluation of the homogeneity. In the easiest case the growing algorithm consist just of the comparison of two adjacent pixels. It is obvious that the result is very sensitive to noisy data. Less sensitivity to noise

  • can be obtained by investigating not only the pixel properties themselves, but a mean property of the local neighbourhood or the properties of already extracted regions. Local neighbourhood properties are e.g. the mean values and variances, but also gradients or Laplacians. The latter area also used for many edge detectors. Using gradient or Laplacian, edges and regions can be extracted by the same operator, which directly takes the duality of regions and edges into account. Combinations of different techniques provide further improvements by consequently using their positive properties. Criteria are the accuracy of the regions are significantly different, the ability to place boundaries in weak areas, and the robustness to noisy data. Region Merging: Assuming the image area being completely partitioned into regions, the aim is to merge adjacent region, which are not significantly different. The main problem of region extraction by region growing algorithms is the question of the merging order. Except of methods working in a highly parallel manner (e.g. relaxation techniques), the result depends on which region was extracted first and which of the adjacent pixels or regions are attended first (usually more than one neighbour fulfils the homogeneity criterion). The determination of the best merging candidate is a time-consuming search algorithm and is difficult to be solved. Less complex approaches consist of well (and locally) defined merging rules.

    Spilt and Merge The splitting algorithm is a process of dividing the image area successively into sub-areas unless the sub-areas satisfy a certain homogeneity criterion. To improve the efficiency the partitioning in sub-areas can be done regularly, i.e. by the partitioning of the still inhomogeneous areas into quarters. This regularity causes squared, artificial and also inaccurate boundaries. To cope with this problem, combinations of split and merge were developed: the strategy starts from any given partition. Adjacent regions are merged if the result

  • is homogeneous, single regions are split if they do not meet the homogeneity criterion. The process continues until no more merging or splitting can be done. A further advantage of this method is that is faster than a single splitting or merging process. Drawbacks The independent application of the techniques presented here reveals a number of drawbacks:

    Techniques aiming at complete partitioning of the image area like region-based approaches lead to uncertain or even artificial boundaries.

    Region-based techniques conceptually are not able to incorporate mid-level knowledge such as the straightness of the boundaries.

    Edge based techniques normally cannot guarantee closed boundaries, thus do not lead to a complete partitioning. Edges are likely to be broken or do not represent the boundaries of the regions (spurious edges) because of image noise.

    Corner detectors usually dont work at junctions. All point detectors have difficulties at smooth corners.

    The used models are either wrong or at least not adaptive to the local image content (e.g. edge detection at junctions).

    To avoid inconsistencies all three-feature types could be extracted simultaneously and therefore be embedded in the same model. A complete feature extraction using points, lines, regions, and their relations leads to a richer and also topologically description of the image. Such an integrated approach (polymorphic feature extraction) is addressed in (LANG and FORSTNER 1996).

  • Expert System for Information Extraction As mentioned above, high-resolution imagery from both aerial and space borne sensors provides a challenge to the user community in terms of information extraction. The human eye and brain can identify objects in the image but the computer finds it difficult. If we cannot automate this process, then we will most certainly lose out on some of the major economic benefits of the imagery. If the human brain can do it, why cant the computer? Well it actually can, if it uses rules or knowledge based processing, just as the human brain does. The brain can make a decision on an image very quickly by understand and using context. If we see grassland in the center of an urban development, we can easily decide that it is a park, as opposed to agricultural land. To make this decision we are using knowledge and experience to create expertise and computer based expert systems are beginning to emerge that mimic this process. For many years, expert systems have been used successfully for medical diagnoses and various information technology (IT) applications but only recently have they been applied successfully to GIS applications. Statistical image processing routines, such as maximum likelihood and ISODATA classifiers, work extremely well at performing pixel-by-pixel analyses of images to identify land-cover types by common spectral signature. Expert-system technology takes the classification concept a giant step further by analyzing and identifying features based on spatial relationships with other features and their context within an image. Expert systems contain sets of decision rules that examine spatial relationships and image context. These rules are structured like tree branches with questions, conditions and hypotheses that must be answered or satisfied. Each answer directs the analysis down a different branch to another set of questions.

    The beauty of an expert system is that because true experts, such as foresters or geologists, create the rules, also called a knowledge base, non-experts can use the system successfully. In terms of satellite images, the knowledge base identifies features by applying questions and hypotheses that examine pixel values, relationships with other features and spatial conditions, such as altitude, slope, aspect and shape. Most importantly, the know ledge base can accept inputs of multiple data types, such as digital elevation models, digital maps, GIS layers and other pre-processed thematic satellite images, to make the necessary assessments.

  • Automatic Information Extraction

    In recent years it has become clear that most of the value of Geographic Information Systems lies in its data, rather than in its hard- or software. For data to be valuable they need to be up-to-date in terms of data completeness, consistency, and accuracy. Mapping is often posed as an end-to-end process where new source imagery is collected to meet certain project specifications and the entire compilation process is performed using a homogeneous set of spatially and temporally consistent data sources. In contrast, other mapping applications require the ability to perform incremental of existing spatial databases from a variety of disparate sources. Thus, a timely revision of GIS databases plays a major role in the overall process of acquiring, manipulating, analyzing, and presenting topographic data.

    Besides techniques like the digitization of large maps and terrestrial surveys, photogrammetry seems to be especially well suited for generating or updating GIS databases, since it has already had a major impact in traditional map updating. Digital photogrammetry based on digital images has the potential to further increase this impact, mainly due to the possibility to at least partly automate and thus speed up the generation and/or the revision process.

    Expect the manual or semi-automatic measurement of ground control points, almost all steps are automated, but frequently some manual post editing is required. Image matching has e.g. still problems in built-up areas and has limitations in forest areas. No robust solutions for break-line detection or object extraction in these data exist so far. Degree of automation: The automated extraction of 3D objects like buildings, roads, bridges, street furniture or vegetation is not yet widely used in practice, which is mainly due to the big technical problems. In order to solve the object extraction task, methods of image understanding and image interpretation are applied. Keywords are image segmentation, object modeling and information fusion in order to detect and reconstruct 3-D objects from 2-D images. Major research efforts are currently put on the extraction of man-made structures like roads and buildings from digital aerial imagery and from space imagery. The approaches range from manual methods, to semi-automated and automated feature extraction methods from single and multiple image frames. New developments on high-resolution space sensors might allow medium and large scale mapping from space. Linear objects like roads, railroads or river networks have since long attracted researchers, but due to the limited resolution of space imagery they could not successfully be extracted for mapping in medium or large scales. With new high resolution sensors of 1 m and much better ground sampling distances 1m-5m the possibilities to extract linear objects have increased dramatically.

  • Schenk (Schenk, 1999) proposes the expression autonomous for a system that can perform autonomously from human interaction. Also those which are called automatic (like automatic DEM generation) are not purely automatic, as they solve the task up to a certain percent of errors. In extension to that Heuel (Heuel, 2000) gives a proposal to classify the automation degree of systems using the terms quantitative and qualitative interaction: methods are defined automatic, if only simple y/n decisions or a selection of alternatives, i.e. qualitative interaction are needed, they are regarded as semi-automatic, if qualitative decisions and quantitative input parameters are needed. We need to initialize the extraction process, we might need to interact during run-time and we certainly need to validate or correct the results. The less interaction we need, the higher is the degree of automation. We expect from the integration of automatic processes, that the overall efficiency of the system is increased, but we know, that those processes can give erroneous results, which are costly for the user and thus may decrease the efficiency of the system. We may want to reduce the level of training by avoiding complexity and skill requirements in decision-making, but we also want to reduce the number of manual actions in the collection phase. Here we should not only refer to the amount of human interaction referred to time and number of mouse operations, but also to the type of interaction needed. We certainly have to select parameters according to the task we want to solve and the data, which is available. This is valid for all systems. We need to give the image numbers of overlapping photographs, we need to defined the units (m or feet) or we need to give the type of features searched for alike buildings and/or roads. We have to provide instructions on how to collect buildings in an interactive system or we need to give a set of building models and some min-max values if we want to extract them automatically. If we need to get deeper involved in the algorithms we might need to give thresholds and steering parameters (window sizes, minimal angle difference, minimal line length in the image etc), which are not always interpretable. Sometimes it is difficult to connect them to task and imager material. This holds also for some stopping criteria for the algorithms, like maximal number of iterations etc. Also the type of post editing can vary. We might need to correct single vertex or corner points, or the topology of whole structures or we need to manually check for completeness. Summarizing the above statements we propose the following scheme, starting from an interactive system, where we can solve all tasks required, to a semi-automatic system, where we interact during the measurement phase, to an automated system, where the interaction is focused at the beginning and the end of the automatic process and to an autonomous system, which is behind horizon right now.

    1. Interactive system (purely manual measurement, no automation for any measurement task).

  • 2. Semiautomatic system (interactive environment and integration of automatic modules in the workflow)

    3. Automated system (interactive environment with interaction before and after the automatic phase).

    4. Autonomous system.

  • Cartographic Feature Extraction

    Of all tasks in photogrammetry the extraction of cartographic features is the most time consuming. Since the introduction of digital photogrammetry much attention therefore has been paid to the development of tools for a more efficient acquisition of cartographic features. Fully automatic acquisition of features like roads and buildings, however, appears to be very difficult and may even be impossible. The extraction of cartographic features from digital aerial imagery requires interpretation of this imagery. The knowledge one needs about the topographic objects and their appearance in aerial images in order to recognize these objects and extract the relevant object outlines is difficult to model and to implement in computer algorithms. Therefore, only limited success has been obtain in developing automatic cartographic feature extraction procedures. Human operators appear to be indispensable for a reliable interpretation of aerial images. Still, computer algorithms can contribute significantly to the improvement of the efficiency of feature extraction from aerial imagery. Whereas human operators are better in interpretation, computer algorithms often outperform operators in case of specific measurement tasks. So-called semi-automatic procedures therefore combine the interpretation skills of the operator with the measurement speed of a computer. This paper reviews the most common strategies for semi-automatic cartographic feature extraction from aerial imagery. In several strategies knowledge about the features to be extracted can easily be incorporated into the measurement part perform by a computer algorithm. Some examples of the usage of such knowledge will be described in the discussion at the end of this paper. Semi-automatic feature extraction Semi-automatic feature extraction is an interactive process between an operator and one or more computer algorithms. To initiate the process, the operator interprets the image and decides which features are to be measured and which algorithms are to be used for this task. By positioning the mouse cursor the approximate location of a feature is pointed out to the algorithm. If required the operator also may tune some of the algorithms parameters and select an object model for the current feature. Semi-automatic feature extraction algorithms have been developed for measuring primitive features such as points, lines and regions, but also for more complex, often parameterized, objects.

    Extraction of points : Semi-automatic measurement of points is used for measuring height points as well as for measuring specific object corners. The first case is usually known as a cursor on the ground utility, which is available in several commercial digital photogrammetric workstations. Here, the operator positions the cursor at some XY-position in a stereoscopic model, whereas the terrain height at this position is determined automatically by matching patches of the stereo image pair. After this determination the 3D

  • cursor snaps to the local terrain surface. Thus, the operator is relieved from a precise stereoscopic measurement and can therefore increase the speed of data acquisition. The second type of point measure algorithms is used to make the cursor snap to a specific object corner. These algorithms can be used for monoplotting as well as for stereoplotting. For monoplotting the operator approximately indicates the location of an object corner to be measured. The image patch around this approximate point will usually contain grey value gradients caused by the edges of the object. By applying an interest operator (see e.g. [Frstner and Glch, 1987]) to this patch the location of the object corner can be determined. Thus, such utilities can make the cursor snap to the nearest point of interest. When using the same principle for stereoplotting, the operator has to supply an approximate 3D position of the object corner. The interest operator can then be applied to both stereo images, whereas the estimated 3D corner position will be constrained by the known epipolar geometry. For the measurement of house roof corners, this procedure was reported to double the speed of data acquisition and reduce the operator fatigue [Firestone et al., 1996].

    Extraction of lines : The extraction of lines from digital images has been a topic of research for many years in the area of computer vision [Rosenfeld, 1969, ueckel, 1971, Davis, 1975, Canny, 1986]. First attempts to extract linear features from digital aerial and space imagery were reported in [Bajcsy and Tavakoli, 1976, Nagao and Matsuyama, 1980]. Semi-automatic algorithms have been eveloped for the extraction of roads. These algorithms can be classified into two categories: algorithms using deformable templates and road trackers.

  • Deformable templates: Before starting algorithms using deformable templates the operator needs to provide the approximate outline of the road. This initial template of the road is usually represented by a polygon with a few nodes near to the road to be measured. The task of the algorithm is to refine the initial template to a new polygon with many more nodes that accurately outline the road edges or the road centre (depending on the road model used). This is achieved by deforming the template such that a combination of two criteria is optimised: the template should coincide with image pixels with high grey value gradients and the shape of the template should be relatively smooth. The latter criterion is often accomplished by constraining the (first and) second derivatives of the template. This constraint is needed for regularisation but is also leading to more likely outline results, since road shapes generally are quite smooth. Most algorithms of this kind are based on so-called snakes [Kass et al., 1988]. The snakes approach uses an energy function in which the two optimisation objectives are combined. After computing the energy gradients due to changes in the positions of the polygon nodes the optimal direction for the template deformation can be found by solving a set of differential equations. In an iterative process the polygon nodes are shifted in this optimal direction. The resulting behaviour of the template looks like that of a moving snake, hence the name. Whereas snakes were initially formulated for optimally outlining linear features in a single image, they can also be used to outline a feature in 3D object space by combining grey value gradients from multiple images together with the exterior orientation of these images [Trinder and Li, 1995, Neuenschwander et al., 1995]. This snakes approach has also been extended to outline both sides of a road simultaneously. More research is conducted to further improve the efficiency of mapping with snakes by reducing the requirements on the precision of the initial template provided by the operator and by incorporating scene knowledge into the template deformation process [Neuenschwander et al., 1995, Fua, 1996].

  • Road trackers In the case of snakes, the operator needs to provide a rough outline of the complete road to be measured. In contrast, the input for road trackers only consists of a small road segment outlined by the operator. The purpose of the road tracker is then to find the adjacent parts of the road. Most road trackers are based on matching grey value profiles [McKeown and Denlinger, 1988, Quam and Strat, 1991, Vosselman and Knecht, 1995]. Based on the initial road segment outlined by the operator, a characteristic grey value profile of the road is derived. Furthermore, the local direction and curvature of the road is estimated. This estimation is used to predict the position of the road at some step size after the initial road segment. At this position and perpendicular to the predicted road direction at this position a grey value profile is extracted from the image. By matching this profile with the characteristic road profile a shift between the two profiles can be determined. Based on this shift, an estimate for the road position along the extracted profile is obtained. By incorporating previously estimated positions, other road parameters like the road direction and the road curvature can also be updated. The updated road parameters can then be used to make a next prediction of the road position at

  • some step size further along the road. This recursive process of prediction, measurement by profile matching and updating the road parameters can be implemented elegantly in a Kalman filter [Vosselman and Knecht, 1995]. The road tracking continues until the profile matching fails at several consecutive predicted positions, i.e. it stops when the several extracted profiles show little correspondence with the characteristic grey value profile. Some characteristic results are shown in figure 3. Trees along the road or road crossings and junctions can often explain matching failures. Due to these objects the grey value profiles extracted at those positions deviate substantially from the characteristic profile. By making predictions with increasing step sizes, the road tracker is often able to jump over these kinds of obstacles and continue the outlining of the road.

    Extraction of areas Due to the lack of modeled knowledge about objects, the computer-supported extraction of area features is more of less limited to areas that are homogeneous with respect to some attribute. Of course, in images the most common attributes to look at are the pixels grey value, colour and texture attributes. Algorithms that extract homogeneous grey value areas can facilitate the extraction of objects like water areas and house roofs. The most common approach is to let the operator indicate a point on the homogeneous object surface and let an algorithm find the outlines of that surface. An example can be seen in figure 4. It is clear that the results of such an algorithm still require some editing by an operator. Overhanging trees at the left side of the river and trees that cast dark shadows at the right side of the river cause differences between the bounds of the homogeneous area and the river borders, as they should be mapped. Similar differences will also arise when using these techniques to extract building roofs. Most objects are not

  • homogeneous enough to allow a perfect delineation. Still, the majority of the lines to be mapped may be at the correct place. Thus, editing the results of such an area feature extraction will often be faster than a complete manual mapping process. Firestone et al. [1996] report the use this technique for mapping lakeshores. Especially for small scale mapping this can be very efficient since the water surface generally appears homogeneous and the disturbing effects of trees along the shoreline, as in the example, may be negligible at small scale. The algorithms used to find the boundaries of a homogeneous area are usually based on the region-growing algorithm [Haralick and Shapiro, 1992]. Starting at the pixel indicated by the operator, this algorithm checks whether an adjacent pixel has similar attributes (e.g. grey value). If the difference is below some threshold, the two pixels are merged to one area. Next, the attributes of another pixel adjacent to this area are examined and this pixel is also merged with the area if the attribute differences are small. In this way a homogeneous area is grown pixel by pixel. This process is repeated until all pixels that are adjacent to the grown area have significantly different attributes.

    Extraction of complex objects As requirements to geographical data are shifting from 2D to 3D and from vector data to object oriented data the acquisition of these data with digital photogrammetry is also increasingly three-dimensional and object based. In particular for the acquisition of 3D objects like buildings and other highly structured objects the usage of object models can be beneficiary. These models contain the topology and the internal geometrical constraints of the object. The usage of these models relieves the operator from specifying these data within the measurement process and will improve the robustness and precision of the data acquisition. A common interactive approach is illustrated in figure 5. After the selection of an appropriate object model by an operator, the operator approximately aligns the object model with the image (left image). In a second step a fitting algorithm is

  • used to find the best correspondence between the edges of the object model and the location of high gradients in the image (middle image). Especially in presence of neighboring edges with High contrast (like the windows on the house front in the example) the resulting fit does often not correspond to the desired result and therefore requires one or more additional corrective measurements by the operator (right image). Different approaches are being used to find the optimal alignment of the object model to the image. Fua [1996] extended the above described snake algorithm for fitting object models. The energy function is defined as a function of the sum of the grey value gradients along the model edges. Derivatives of this energy function with respect to changes in the co-ordinates of the object corners determine the optimal direction for changes in these co-ordinates, whereas constraints on the co-ordinates ensure that a valid building model with parallel and rectangular edges is maintained. Lowe [1991] and Lang and Schickler [1993] use parametric object descriptions and determine the optimal parameter values by fitting the object edges to edge pixels (pixels with high grey value gradients) and extracted linear edges respectively. Veldhuis [1998] analysed the approaches of Fua [1996] and Lowe [1991] with respect to suitability for mapping.

    Semi-automatic measurement techniques as reviewed in this paper surely improve the efficiency of cartographic feature extraction. In most cases there is a clear interaction between the human operator and one or more measurement algorithms. Prior to the measurement the task of the operator is to identify the object to be measured, to select the correct object model and algorithm and to provide approximate values. After the measurement by the computer the operator needs to correct part of the measurements, since the delineation resulting from the objective of the measurement algorithm often does not correspond with the desired object boundaries. Robustness as well as precision

  • of the semi-automatic measurements can be improved by incorporating knowledge about the topographic features into the measurement process. A clear example of this was already shown for the case of complex object measurement. Further knowledge can be knowledge can be added in the form of constraints between neighbouring houses and roads. Hwang et al. [1986] e.g. uses the fact that most houses are parallel to a road and that houses are often connected to a road by a driveway. In the case of linear features many more heuristics can be used to guide the feature extraction. Cleynenbreugel et al. [1990] notice that roads usually have no steep slopes and that, therefore, digital elevation models can be useful for road extraction in mountainous areas. Furthermore they notice that the road patterns are often typical for the type of landscape (mountainous, flat rural, urban). Soft bounds on the usually low curvature of principal roads are used in the road tracker described in [Vosselman and Knecht, 1995]. Useful properties of water surfaces are related to height. Fua [1996] extracts rivers as 3D linear features and imposes the constraint that the height of the river decreases monotonously. Furthermore, when lakes are extracted as 3D surfaces they can often be assumed to be horizontal. The latter constraint can be used to automatically detect delineation errors caused by occluding trees along the lakeshore. To obtain a higher degree of automation by interpretation of the aerial image by computer algorithms much more knowledge is to be modelled. Knowledge based interpretation of aerial images and the usage of existing GIS databases within this process is a topic of current research efforts [Kraus and Waldhusl, 1996, Gruen et al., 1997].