Institut d'Électronique et de Télécommunications de Rennes

| ENT |

Accueil du site > Accueil > Production scientifique > Thèses en cours > Équipe VAADER (Dpt IMAGE)

3D modeling of urban environment from ground videos and block data for image based rendering

Doctorant : Hengyang WEI.

Directrices de thèse : Luce MORIN, Muriel PRESSIGOUT.

Thèse débutée le : 05/03/2014.

Modeling a large 3D natural environment (such as a city) in order to provide both free navigation and immersive rendering is still an open issue. Here, the term immersive means : a perception of real 3D (perception of relief and parallax) and real textures (that is similar to photograph or videos). Free navigation refers to a navigation mode which allows both global (aerial) and local (ground-based) exploration of the environment.

Numerous solutions are now available to model at large scale urban zones. They are based on the exploitation of GIS databases, images (either terrestrial or aerial) or synthetic models constructed by laser scans. These databases provide 3D models with sufficient quality for many services (e.g. Google Earth, Géoportail, etc.). However these 3D models are quite unsuitable for ground based navigation because of the lack of details and realism.

On the other hand, some systems rely on the capture of a huge number of photographs taken at the ground level. These high quality 2D views ensure immersive views but do not allow free navigation, only point-to-point navigation is possible. An example of this kind of system is Google Street View.

Some high quality 3D models (such as Geosim Philadelphia) provide immersive navigation at ground level while staying in a 3D environment, but the associated data are produced manually, which prohibits any large scale deployment (many months are needed to model a city).

Image-based modelling and rendering provide much more flexible solutions to this problem, using only photos and videos. Several research studies show it is possible to get very high quality rendered images, even using very simple 3D models. The most famous works in this field are those of Paul Debevec [1], which use view-dependant texture mapping on a coarse 3D model. The results are impressively realistic, but the coarse model construction is performed mostly manually. These works are extended in [2], by using a structure from motion algorithm to initialize a point cloud to help the user through the modeling process.

Another rendering method proposed in several studies consists in using a very simple model (a parallelepiped aligned with the ground and building façades) textured mapped with a single photograph [3]. Navigation facilities are limited but the results are still attractive with regards to the simplicity of the modeling process.
To go further this last approach – while keeping the advantage of simplicity – one has to use image interpretation techniques, so that to recognize for instance the horizon position, façades location, and local masking objects. In this way one can take advantage of this knowledge to refine the coarse 3D model and get visually appealing results. Automatic Photo Popup [4] is one of these algorithms. It segments a single image in contextual zones (ground, façade, sky and masking objects) using a learning database. This algorithm has been integrated in a study on automatic urban reconstruction [5], in order to reduce the complexity of the matching procedure between images and coarse 3D building models.

Following this approach, the proposed phD topic aims at modeling urban environments using ground based videos and 3D block models. The ground based videos will provide realistic textures which are necessary for immersive rendering, while the block model will provide a coarse geometry of the environment.
The first step will be the registration of video and block data, which is mandatory for the subsequent fusion and texturing process. In theory, this step is a classical camera pose estimation procedure, however, considering the context of ground-based videos (rough 3D model with little geometric information available, small areas of 3D block model visible in input views,…), it is still a bottleneck and unsolved issue.
Structure from Motion techniques will then be used to enrich the provided geometry with contents present in the images and not in the block model, such as trees, cars, or relief in the building façades. The reconstructed model should be though to optimize realism during Image Based Rendering, rather than fidelity to the 3D geometry of the scene. View Dependent Texture as well as view Dependent Geometry modeling are envisioned to achieve this goal.

[1] Paul Debevec, Yizhou Yu, and George Boshokov. Efficient view-dependent image-based rendering with projective texture-mapping. Technical Report CSD-98-1003, 20, 1998.
[2] Sudipta N. Sinha, Drew Steedly, Richard Szeliski, Maneesh Agrawala, and Marc Pollefeys. Interactive 3d architectural modeling from unordered photo collections. ACM Trans. Graph., 27(5):1–10, 2008.
[3] Kevin Boulanger, Kadi Bouatouch, and Sumanta Pattanaik. ATIP : A Tool for 3D Navigation inside a Single Image with Automatic Camera Calibration . In Proceedings of the EG UK conference on Theory and Practice of Computer Graphics, pages 71–79, 2006.
[4] Derek Hoiem, Alexei A. Efros, and Martial Hebert. Automatic photo popup. ACM Trans. Graph., 24(3):577–584, 2005.
[5] Gaël Sourimant, Luce Morin, and Kadi Bouatouch. Gps, gis and video registration for building reconstruction. In ICIP 2007, 14th IEEE International Conference on Image Processing, San Antonio, Texas, USA, sep 2007. The original publication is available at

Page du doctorant.

Départements et équipes de recherche