Semi-automatic generation of highly detailed textured building models
Under taken by:
- University of Bonn
Institute of Photogrammetry
Prof.Dr.-Ing Wolfgang Förstner CV (PDF)
The aim of the project is to develop the models, methods, and tools for the semi-automatic, interactive refinement of 3D city models. Starting from models of level of detail 2 (LOD 2), consisting of simple building blocks and roof structures, the refinement leads to models of LOD 3, namely detailed building models with ground edges, stairs, balconies, windows, doors, and textures. Focus is on the transition between building facades and the digital eleva-tion model, the so called building collar. The main component is a 3D spatial database which integrates vector building models and aerial and terrestrial raster images. A carefully designed user interface supports the interactive resolution of ambiguities in the interpretation of the images. For the integration of geometry and texture an evaluation function will be developed which balances the inextricable conflict between model and image with respect to visual plau-sibility.
The novelty of the approach is (1) the specification of high resolution object models which cope with the inherent inconsistencies between the high resolution image data and the envis-aged level of detail in a 3D GIS with respect to a plausible visualisation, and (2) the recon-struction of the instantiated object model from crudely geo-referenced possibly uncalibrated images and its generalization according to the model specification.
In the first two years we develop the 3D object model integrating vector and raster data, and demonstrate the fusion of digital elevation model data, 3D building data and image data for achieving a consistent and visually plausible object description in the context of building col-lars. We restrict to scenes with a low degree of occlusion. In the last year we increase e_ciency of the interpretation process and increase the complexity of the tackled scenes, espe-cially with respect to occlusions.
Motivation and Overview
The goal of the project is to develop models and acquisition tools for the ground edge of buildings.
Research and development in the last decade is characterized by the following achievements:
- Highly automatic methods for DEM-generation from laser range sensors and images.
- Semi-automatic tools for 3D-building acquisition at a resolution of 30 cm to 1 m.
- Research into the acquisition of architectural structures at a resolution of 5 cm to 10 cm with promising results
- Photo-realistic visualization of landscapes at medium scales, of 3D-city models for selected objects, and realistic rendering of arbitrary synthetic 3D-scenes.
Computer Graphics increasingly is applying and developing Computer Vision tools for acqui-sition of 3D-models, not only buildings, but humans, vegetation, industrial objects. Limita-tions immediately become apparent when visualizing 3D-structures.
- of complex real 3D-objects derived from laser or image data or
- of 3D-data integrated from di_erent sources or of di_erent resolution.
In the context of city models these di_culties become apparent when integrating 3Dbuilding models and DEM's with terrestrial images for achieving realistic views and visualizing a walk through: The ground edge of buildings in general is by far not consistent with image data cap-tured from the ground.
This has two basic reasons:
- The inaccuracy of the 3D-geometry of the ground edge caused by the acquisition of the geometry from aerial surveys, provided the ground edge of a building is well defined.
- The lack of a definition of a ground edge at the scale of images taken from the ground. The lack of a definition of a ground edge can be identified as the main hindrance for the gen-eration of high resolution city models.
The goal of the project is to develop LOD3-models of buildings and methods for their acqui-sition. We concentrate on the transition from building facades to the DEM with respect to adequately visualize the path of a human (1) between the terrain or road and the building and (2) around the building for navigation purposes.
The transition ranges from about 1.5 floors up the facade to about 3 to 10 m around the build-ing either leading to an open area, e. g. garden or to a street.
We call this banded area the building collar. The function of the building collar is manyfold, e.g. protecting the house from corrosion by water or from demolition by plants by isolation layers, or connecting the interior and the exterior by the doors, stairs possibly with railings, basement windows or paths for the fire brigade, or improving the estetical appearance by front gardens. Building collars have no fixed structure, may be very simple, especially in flat terrain or very complex, especially in hilly terrain. The transition to the open area or roads may be just realized by changing surface material or by an explicit border, e.g. using walls.
Various instances of building collars showing paths, front garden, stairs, ramps, entries, walls, arcades.
The project wants to solve the following two interwoven problems:
- Establishing a generic model for building collars. The goal here is to explicitly describe the structure with respect to the ontology, the 3D-structure, the mutual constraints be-tween parts of a building collar. The resolution of the model should allow to (1) generate objects in a 3D-GIS, seen as a long term memory, which may be used for visualization es-pecially for navigation, (2) to support the interactive data acquisition process based on im-ages, requiring a short term memory capturing diverse intermediate results.
- Establishing image analysis tools for the semi-automatic 3D-acquisition of building col-lars. The goal here is to start from partially calibrated and partially oriented digital images, automatically perform the calibration and orientation based on possibly interactively iden-tified control features, to geometrically reconstruct the 3D-surface of the object and to segment and interprete the 3D-model on the basis of the generic building collar model. The interactivity is relevant for defining regions of interest, for disambiguing hypotheses and for verifying intermediate and final results.
The model consists of the following layers:
Ontology for all parts and their mutual constraints
Starting point is the LOD2 model and the model for the DEM, possibly information about streets. These establish the top nodes in the aggregation and the abstraction hierarchy. They will be refined (1) to the envisaged LOD3 level and (2) to the level which is re-quired for the interpretation of the photogrammetric data. The ontology is intended to con-tain information about function, especially about paths.
Main effort will be the modeling of the stochastic relations between objects and parts, condensed in a graphical model, likely to be a Bayesian network, showing the statistical independencies. The statistics will be taken from real examples as far as possible. This model can be interpreted as a priory distribution of building collars. Formally, the model m needs to be partitioned into its structure s and the parameters p leading to
P(m) = P(s, p) = P(p|s)P(s)
Thus the ontology via the structure of the Bayesian network yields a priori information about the expected building collars.
The Bayesian network serves three tasks: (1) the evaluation of intermediate hypotheses and (2) triggering the interpretation process, and (3) predicting hitherto not interpreted or occluded parts of the scene.
Geometry for all parts and subparts and their mutual relations
The modeling of the geometry refines the structural model. It has to be fine enough to en-able classification of geometric and textural structures. Especially, the topological consis-tency of the geometric model needs to be made explicit.
Here we intend to use a mixture of a constructive solid geometry (CSG) and a boundary representation (BRep) model: the CSG being the link to the ontology, the BRep being the link to the image data.
In a first step we will restrict to planar, cylindrical and free-form surfaces, which might be textured. Besides real-valued parameters of parametric models the geometric modelalso contains discrete parameters, e.g. the number of steps of a stair, which allow to model the geometric structure. This separates the usually causal relations of the ontological model from the neighborhood relations, being typical for geometrical structures, and which can-not be captured by a Bayesian network.
Visual appearance and invariances of parts and subparts
The finest level the model refers to the appearance of the objects in images. This part of the model is required both (1) for the reconstruction process and (2) for the visualization process. Whereas the first task requires crisp constraints, e.g. being the basis for matching algorithms, the second task of visualization needs to be able to cope with weak con-straints, e.g. when visualizing parts of the scene which are reconstructed with low accu-racy or even just predicted. Here we touch the classical problem of balancing accuracy in geometry and plausibility of texture maps. We will develop and investigate tools for visu-alizing uncertain scene structures.
Problems and envisaged solutions:
Calibration and Orientation of consumer cameras
A module for calibration and orientation of consumer cameras will be provided by the basis group: Starting from affine invariant feature points we perform pairwise image matching leading to relative orientation between all overlapping image pairs. Checking consistency of relative orientations we obtain good enough approximate values for a final bundle adjustment, leading to optimal projection matrices and 3D points.
3D-Surface model from multiple images
We expect to get the system for 3D-surface reconstruction by C. Strecha, from KU Leu-ven. It yields a depth map, thus a 2.5 D representation of the surface. This 2.5 D models does not cover the surface completely, both, due to occlusions and due to deficiencies of the matching algorithm, e.g. in poorly textured areas. This surface representation might not be su_cient for more complex geometries, e.g. showing pillars. Therefore we need to fuse multiple 2.5 D models to obtain a 3D surface model. This should be done automati-cally by surface matching methods, taking the 3D geometry and the corresponding texture into account. The result will be a more complete surface model which - in contrast to the initial 2.5 D surface models - may have any topology, but still will be incomplete due to occlusions and lack of texture. Main problem will be the gluing of overlapping surface ar-eas coming from different initial 2.5 D surfaces. This 3D matching module needs to be developed in the project. Deriving this type of data from images is promising but hard, but it is cheaper than data acquisition from laser range sensors.
Segmentation and interpretation of the 3D surface model
The core module of the image analysis consists of several modules for segmenting and in-terpreting the 3D surface model. Segmentation means grouping of surface elements into surface patches with respect to geometry and texture. No complete partitioning is aimed at, which is consistent with the incompleteness of the surface representation. In the con-text of building collars we might distinguish planar and cylindrical patches of man made objects from free form surfaces of vegetation. Therefore a classification of surface patches into different classes based on geometry, color and texture will be developed. Adjacent surface patches might be aggregated into polyhedral surface patches. It is not yet decided whether this segmentation and classification process is realized using Hidden Markov Random Fields. Using this technology only appears to be justified if the noise in the data is in a certain range: If it is too low, simpler methods do the same job, if it is too high the model must be very strong in order to obtain good results. Interpretation of the surface patches may rely on their classification. It can be seen as a grouping process. In our con-text we have a wide range of 3D-models, which themselves are structured, e.g. stairs have a free number of steps, paths around the building may be multiply connected. Grouping can not be expected to be done in one step in general. We therefore need to develop grouping processes which go along the aggregation hierarchy of the object model. In case of modeling the object using a Bayesian network, we are able to trigger the grouping process by the object model directly. We want to investigate in how far this can be real-ized and is effective. The complete segmentation and interpretation process can rely on the approximate LOD2 level 3D-description and an approximate DEM. Especially the di-rection of walls is useful for performing spatial reasoning, e.g. in a local wall related co-ordinate system.
Evaluation and Control
Evaluation and Visual Plausibility The evaluation of the final description needs to be realized in two ways:
- internally, in the sense of a self diagnosis
- externally, in the sense of a performance characteristics
The internal evaluation is based on the coherence of model and data. As both, model and data, are uncertain we in principle need to maximize the probability of the instatiated model m given the data d. The model is characterized by its structure s and its parameters p. Formally we therefore may maximize the probability
P(m|d) = P(s, p|d) / P(d|m)P(m) = P(d|s, p)P(p|s)P(s)
Here the a priori knowledge P(m) = P(p|s)P(s) explicitely is made dependent on the structure and the parameters, as far as they are relevant. We also could minimize the self-information or description length (DL)
I(s, p|d) / I(d|s, p) + I(p|s) + I(s)
In case of no a priori knowledge about the structure, e.g. the number of steps of a stair, we might use uninformed prior probabilities, which is the core idea of the classical minimum description length-principle.
Practically we will not be able to find the optimum description, due to the enormous search space of possible instances. However, we might compare different hypotheses based on these evaluation criteria. The main effort therefore is to establish a large enough but manageable search space.
The external evaluation needs to be based on visual inspection and can - in a first step - be based on simple performance measures, such as correctness and completeness. These measures, however, do not reflect the relevance of the different parts of the model. In our context of navigation we are interested in visual plausibility. This puts more emphasis on the correct reconstruction and interpretation of the paths from the building to its surrounding, e.g. street or garden. Reconstructions are not plausible, if there is no safe path from the build-ing to its surrounding or around the building which can be visualized for guidance purposes. In a first attempt we check visual plausibility manually. It might be interesting to formalize at least the first part: checking the existence of a safe path, e.g by simulating and evaluating the smoothness of a walk on the path.
Control and Interactivity We do not expect the result to be achieved fully automatically. Therefore we need to provide mechanisms for interactive control. There are at least the fol-lowing tasks we expect an operator to perform:
Definition of a region of interest
The definition of a region of interest (ROI) technically is the first step of the reconstruc-tion process. It does not pose problems and prevents the development of automatic screen-ing techniques. The ROI will be used to cut out the relevant 3Dstructure from the LOD2 description and the given DEM. It also triggers the selection of the image data which are relevant for the task. After refinement of the 3D-description, it will be reinserted into the 3D-building description, requiring some treatment of the transition area between the LOD2 and the LOD3 description.
Defining control features
The given images in general are not precisely geocoded. The identification of control fea-tures - mainly points and lines - is di_cult to automize. The relative orientation of all im-ages does not provide the exterior orientation, i.e. the position , rotation and scale of the photogrammetric model. Therefore the operator is responsible for specifying of control features.
Disambiguing alternative hypotheses
The automatic procedures for classification and interpretation might lead to ambiguous re-sults, e.g due to occlusions, discrepancies between data, incompleteness of models. Then alternative hypotheses are nearly equally likely. Here the operator may be asked to decide.
Providing models for completion
The automatically derived descriptions will be incomplete, mainly because of occlusions. In simple cases the system might complete 3D-structures. This may be based on symme-tries or other regularities contained in the model. However, in case of large occlusions completion will become very uncertain. Then the operator may complete the model by hand, or decide to leave the result incomplete. Parts of the final description, which actu-ally are predictions based on the model - of the computer or the operator - will be marked.
Final evaluation and editing.
The result of automatic interpretations processes needs at least to be checked for validity. In general some editing is required. Therefore we will develop tools for inspecting and ed-iting the final model. For consistency reasons the editing tools also need to be based on the building model.
The project consists of two phases. The aim of the first phase is to prove the feasibility of the approach and the plausibility of the evaluation function by a running prototype. Benchmark will be the visual plausibility of reconstruction results on selected scenes. The focus lies on the reconstruction of building collars consisting of paths, walls, stairs, and arcades. In order to keep complexity tractable, the prototype will be limited with respect to its capabilities to cope with occlusions and relies on interactive selection and classification of the object types to be reconstructed. Figure 2 shows scenes, the prototype shall be able to cope with after the first two years of this project.
In the second phase we shall increase the degree of automation on the one hand and ensure the robust treatment of occluded scenes on the other hand. At the end of the project a system pro-totype for the semi-automatic acquisition of detailed building models and their transition to the digital terrain model will be available. Practicability will be shown using images of scenes from Bonn and Beijing. The system will be usable with little training effort.
Simple scenes showing building collars with paths, stairs, walls, arcades.
Main subtasks of the first phase are
- Development of the ontology for building collars including object interrelationships, geometric and topological constraints
- Import and adaption of the 3D spatial database kernel from the neighbored DFG project ,,Ontological scales for automatic acquisition of landscape models"
- Extension of the 3D spatial database by methods to cut out, modify and replace 3D model parts
- Adaption and extension of the module for dense, textured 3d surface reconstruction from stereo images
- Development of the 3D matching module for the integration of di_erent 3D surfaces
- Development of the concepts for segmentation and grouping of 3D surface models
- Construction of the evaluation function that balances visual plausibility on the one hand and model complexity on the other hand
- Concept for user interaction and development of a GUI for the system prototype
- Integration of all modules within a prototype system and evaluation using simple scenes
The following issues will be treated in the second project phase:
- Derivation of statistical properties (a priori probabilities) for all objects and relations in the ontology
- Development of strategies for controlling the reconstruction process
- Automatic completion of incompletely reconstructed 3d structures
- Urban information.